ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2014, Vol. 51 ›› Issue (12): 2644-2652.doi: 10.7544/issn1000-1239.2014.20131011

• 人工智能 • 上一篇    下一篇

水面无人艇自适应危险规避决策过程收敛性分析

张汝波1,2,唐平鹏1,3,杨歌1,李雪耀1,史长亭1   

  1. 1(哈尔滨工程大学计算机科学与技术学院 哈尔滨 150001);2(大连民族学院机电信息学院 辽宁大连 116600);3(武汉第二船舶设计研究所 武汉 430064) (tpinheu@163.com)
  • 出版日期: 2014-12-01
  • 基金资助: 
    基金项目:国家自然科学基金项目(60975071,61100005,60975019)

Convergence Analysis of Adaptive Obstacle Avoidance Decision Processes for Unmanned Surface Vehicle

Zhang Rubo1,2, Tang Pingpeng1,3, Yang Ge1, Li Xueyao1, Shi Changting1   

  1. 1(College of Computer Science and Technology, Harbin Engineering University, Harbin 150001); 2(College of Electromechanical and Information Engineering, Dalian Nationalities University, DaLian, Liaoning 116600); 3(Wuhan Second Ship Design and Research Institute, Wuhan 430064)
  • Online: 2014-12-01

摘要: 水面无人艇(unmanned surface vehicle, USV)是一种重要的海洋自主机器人,当前正被广泛研究并逐渐应用于实际.然而USV的安全航行问题仍严重制约其自主性能的提高,尤其是在复杂海况下的危险规避问题亟待解决.以Sarsa在线策略强化学习算法为基础,提出了USV在复杂海况下的自适应危险规避决策模型,并以渐进贪心策略作为行为探索策略,证明了USV自适应危险规避决策过程能够以概率1收敛到最优行为策略.论证结果表明,采用在线策略强化学习算法提升USV在复杂海况下的危险规避性能是可行的.

关键词: 水面无人艇, 复杂海况, Sarsa在线策略强化学习, 自适应危险规避决策过程, 渐进贪心策略

Abstract: Unmanned surface vehicle (USV) is a kind of important marine autonomous robots, which has been studied and applied to practice gradually. However, the autonomy of USV is still restricted by the performance of autonomous navigation technology. Especially, the problem of adaptive obstacle avoidance in complicated sea-state marine environments needs to be solved urgently. In the paper, an adaptive avoidance decision process model is proposed for USV to solve the problem of obstacle avoidance in complicated sea-state marine environments. By analyzing the disturbance factors from complicated sea-state marine environments, the model is constructed on the basis of Sarsa on-policy reinforcement learning algorithm. By setting the GLIE (greedy in the limit and infinite exploration) as the action exploration, the convergence of the adaptive avoidance decision process has been proved. The convergence shows that the action can converge to the optimal action strategy with the probability value of one. The proved result demonstrates that the performance of obstacle avoidance of USV in the complicated sea-state marine environment can be enhanced under the action of on-policy reinforcement learning algorithm.

Key words: unmanned surface vehicle (USV), complicated sea-state, Sarsa on-policy reinforcement learning, adaptive obstacle avoidance decision process, greedy in the limit and infinite exploration (GLIE)

中图分类号: