ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (3): 579-587.doi: 10.7544/issn1000-1239.2015.20131270

• 人工智能 • 上一篇    下一篇

基于BP神经网络的双层启发式强化学习方法

刘智斌1,2,曾晓勤2,刘惠义2,储荣2   

  1. 1(曲阜师范大学信息科学与工程学院 山东日照 276826); 2(河海大学计算机与信息学院 南京 210098) (lzbxian@163.com)
  • 出版日期: 2015-03-01
  • 基金资助: 
    基金项目:国家自然科学基金项目(60971088,60571048)

A Heuristic Two-layer Reinforcement Learning Algorithm Based on BP Neural Networks

Liu Zhibin1,2, Zeng Xiaoqin2, Liu Huiyi2, Chu Rong2   

  1. 1(School of Information Science and Engineering, Qufu Normal University, Rizhao, Shandong 276826); 2(College of Computer and Information, Hohai University, Nanjing 210098)
  • Online: 2015-03-01

摘要: 强化学习通过与环境交互的方式进行学习,在较大状态空间中其学习效率却很低.植入先验知识能够提高学习速度,然而不恰当的先验知识反而会误导学习过程,对学习性能不利.提出一种基于BP神经网络的双层启发式强化学习方法NNH-QL,改变了传统强化学习过程的盲目性.作为定性层,高层由BP神经网络构成,它不需要由外界提供背景知识,利用Shaping技术,将在线获取的动态知识对底层基于表格的Q学习过程进行趋势性启发.算法利用资格迹技术训练神经网络以提高学习效率.NNH-QL方法既发挥了标准Q学习的灵活性,又利用了神经网络的泛化性能,为解决较大状态空间下的强化学习问题提供了一个可行的方法.实验结果表明:该方法能够较好地提高强化学习的性能且具有明显的加速效果.

关键词: NNH-QL, 强化学习, Q学习, 神经网络, 路径规划

Abstract: Reinforcement learning is a promising learning approach for agent to interact with environment from repeated training. However, it is bedeviled by the curse of dimensionality so that it can be hardly applied to large scale problems due to its low efficiency. Imbedding static prior knowledge can improve the learning performance of reinforcement learning, but inappropriate knowledge often misguides the learning process or reduces the learning speed. In this paper, an online heuristic two-layer reinforcement learning algorithm based on BP neural networks, named NNH-QL, is proposed for the purpose of avoiding the blindness and limitation of the previous learning methods. The top layer, served as reward shaping function, is constituted by BP neural networks. By shaping, the qualitative top layer provides dynamic online acquired knowledge to instruct the Q-learning based on table. In order to improve the learning efficiency of the qualitative layer, the eligibility traces are incorporated into the BP neural networks training processes. The NNH-QL method combines the flexibility of standard Q-learning and the generalization performance of BP neural networks. All the methods above offer feasible methods to solve reinforcement learning problems in larger state space. For testing, the NNH-QL algorithm is applied to an optimal path search problem. The results show that this algorithm can improve the learning performance and accelerate the learning process obviously.

Key words: NNH-QL, reinforcement learning, Q-learning, neural networks, path planning

中图分类号: