• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

面向无人驾驶时空同步约束制导的安全强化学习

王金永, 黄志球, 杨德艳, Xiaowei Huang, 祝义, 华高洋

王金永, 黄志球, 杨德艳, Xiaowei Huang, 祝义, 华高洋. 面向无人驾驶时空同步约束制导的安全强化学习[J]. 计算机研究与发展, 2021, 58(12): 2585-2603. DOI: 10.7544/issn1000-1239.2021.20211023
引用本文: 王金永, 黄志球, 杨德艳, Xiaowei Huang, 祝义, 华高洋. 面向无人驾驶时空同步约束制导的安全强化学习[J]. 计算机研究与发展, 2021, 58(12): 2585-2603. DOI: 10.7544/issn1000-1239.2021.20211023
Wang Jinyong, Huang Zhiqiu, Yang Deyan, Xiaowei Huang, Zhu Yi, Hua Gaoyang. Spatio-Clock Synchronous Constraint Guided Safe Reinforcement Learning for Autonomous Driving[J]. Journal of Computer Research and Development, 2021, 58(12): 2585-2603. DOI: 10.7544/issn1000-1239.2021.20211023
Citation: Wang Jinyong, Huang Zhiqiu, Yang Deyan, Xiaowei Huang, Zhu Yi, Hua Gaoyang. Spatio-Clock Synchronous Constraint Guided Safe Reinforcement Learning for Autonomous Driving[J]. Journal of Computer Research and Development, 2021, 58(12): 2585-2603. DOI: 10.7544/issn1000-1239.2021.20211023

面向无人驾驶时空同步约束制导的安全强化学习

基金项目: 国家重点研发计划项目(2018YFB1003900);国家自然科学基金项目(61772270, 62077029)
详细信息
  • 中图分类号: TP181

Spatio-Clock Synchronous Constraint Guided Safe Reinforcement Learning for Autonomous Driving

Funds: This work was supported by the National Key Research and Development Program of China (2018YFB1003900) and the National Natural Science Foundation of China (61772270, 62077029).
  • 摘要: 无人驾驶系统综合了软件和硬件复杂的交互过程,在系统设计阶段,形式化方法可以保证系统满足逻辑规约和安全需求;在系统运行阶段,深度强化学习被广泛应用于无人驾驶系统决策中.然而,在面对没有经验的场景和复杂决策任务时,基于黑盒的深度强化学习系统并不能保证系统的安全性和复杂任务奖励函数设置的可解释性.为此提出了一种形式化时空同步约束制导的安全强化学习方法.首先,提出了一种形式化时空同步约束规约语言,接近自然语言的安全需求规约使奖励函数的设置更具有解释性.其次,展示了时空同步自动机和状态-动作空间迁移系统,保证强化学习的状态行为策略更加安全.然后,提出了结合形式化时空约束制导的安全强化学习方法.最后,通过无人驾驶汽车在高速场景变道超车的案例,验证所提方法的有效性.
    Abstract: Autonomous driving systems integrate complex interactions between hardware and software. In order to ensure the safe and reliable operations, formal methods are used to provide rigorous guarantees to satisfy logical specifications and safety-critical requirements in the design stage. As a widely employed machine learning architecture, deep reinforcement learning (DRL) focuses on finding an optimal policy that maximizes a cumulative discounted reward by interacting with the environment, and has been applied to autonomous driving decision-making modules. However, black-box DRL-based autonomous driving systems cannot provide guarantees of safe operation and reward definition interpretability techniques for complex tasks, especially when they face unfamiliar situations and reason about a greater number of options. In order to address these problems, spatio-clock synchronous constraint is adopted to augment DRL safety and interpretability. Firstly, we propose a dedicated formal properties specification language for autonomous driving domain, i.e., spatio-clock synchronous constraint specification language, and present domain-specific knowledge requirements specification that is close to natural language to make the reward functions generation process more interpretable. Secondly, we present domain-specific spatio-clock synchronous automata to describe spatio-clock autonomous behaviors, i.e., controllers related to certain spatio- and clock-critical actions, and present safe state-action space transition systems to guarantee the safety of DRL optimal policy generation process. Thirdly, based on the formal specification and policy learning, we propose a formal spatio-clock synchronous constraint guided safe reinforcement learning method with the goal of easily understanding the safe reward function. Finally, we demonstrate the effectiveness of our proposed approach through an autonomous lane changing and overtaking case study in the highway scenario.
  • 期刊类型引用(3)

    1. 蒋荣军. 基于Concenter-Net神经网络的无人驾驶汽车实时规划方法. 数学的实践与认识. 2023(05): 164-171 . 百度学术
    2. 刘泽润,刘超. 可持续建成环境研究的机器学习应用进展与展望. 风景园林. 2023(07): 51-59 . 百度学术
    3. 孙聪,曾荟铭,宋焕东,王运柏,张宗旭,马建峰. 基于机器学习的无人机传感器攻击在线检测和恢复方法. 计算机研究与发展. 2023(10): 2291-2303 . 本站查看

    其他类型引用(15)

计量
  • 文章访问数:  718
  • HTML全文浏览量:  3
  • PDF下载量:  541
  • 被引次数: 18
出版历程
  • 发布日期:  2021-11-30

目录

    /

    返回文章
    返回