Spatio-Clock Synchronous Constraint Guided Safe Reinforcement Learning for Autonomous Driving

Wang Jinyong; Huang Zhiqiu; Yang Deyan; Xiaowei Huang; Zhu Yi; Hua Gaoyang

doi:10.7544/issn1000-1239.2021.20211023

Wang Jinyong, Huang Zhiqiu, Yang Deyan, Xiaowei Huang, Zhu Yi, Hua Gaoyang. Spatio-Clock Synchronous Constraint Guided Safe Reinforcement Learning for Autonomous DrivingJ. Journal of Computer Research and Development, 2021, 58(12): 2585-2603. DOI: 10.7544/issn1000-1239.2021.20211023

Citation:

Spatio-Clock Synchronous Constraint Guided Safe Reinforcement Learning for Autonomous Driving

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Autonomous driving systems integrate complex interactions between hardware and software. In order to ensure the safe and reliable operations, formal methods are used to provide rigorous guarantees to satisfy logical specifications and safety-critical requirements in the design stage. As a widely employed machine learning architecture, deep reinforcement learning (DRL) focuses on finding an optimal policy that maximizes a cumulative discounted reward by interacting with the environment, and has been applied to autonomous driving decision-making modules. However, black-box DRL-based autonomous driving systems cannot provide guarantees of safe operation and reward definition interpretability techniques for complex tasks, especially when they face unfamiliar situations and reason about a greater number of options. In order to address these problems, spatio-clock synchronous constraint is adopted to augment DRL safety and interpretability. Firstly, we propose a dedicated formal properties specification language for autonomous driving domain, i.e., spatio-clock synchronous constraint specification language, and present domain-specific knowledge requirements specification that is close to natural language to make the reward functions generation process more interpretable. Secondly, we present domain-specific spatio-clock synchronous automata to describe spatio-clock autonomous behaviors, i.e., controllers related to certain spatio- and clock-critical actions, and present safe state-action space transition systems to guarantee the safety of DRL optimal policy generation process. Thirdly, based on the formal specification and policy learning, we propose a formal spatio-clock synchronous constraint guided safe reinforcement learning method with the goal of easily understanding the safe reward function. Finally, we demonstrate the effectiveness of our proposed approach through an autonomous lane changing and overtaking case study in the highway scenario.

FullText(HTML)

References (0)

Cited By

Turn off MathJax

Article Contents

Spatio-Clock Synchronous Constraint Guided Safe Reinforcement Learning for Autonomous Driving

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content