• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Liu Zhibin, Zeng Xiaoqin, Liu Huiyi, Chu Rong. A Heuristic Two-layer Reinforcement Learning Algorithm Based on BP Neural Networks[J]. Journal of Computer Research and Development, 2015, 52(3): 579-587. DOI: 10.7544/issn1000-1239.2015.20131270
Citation: Liu Zhibin, Zeng Xiaoqin, Liu Huiyi, Chu Rong. A Heuristic Two-layer Reinforcement Learning Algorithm Based on BP Neural Networks[J]. Journal of Computer Research and Development, 2015, 52(3): 579-587. DOI: 10.7544/issn1000-1239.2015.20131270

A Heuristic Two-layer Reinforcement Learning Algorithm Based on BP Neural Networks

More Information
  • Published Date: February 28, 2015
  • Reinforcement learning is a promising learning approach for agent to interact with environment from repeated training. However, it is bedeviled by the curse of dimensionality so that it can be hardly applied to large scale problems due to its low efficiency. Imbedding static prior knowledge can improve the learning performance of reinforcement learning, but inappropriate knowledge often misguides the learning process or reduces the learning speed. In this paper, an online heuristic two-layer reinforcement learning algorithm based on BP neural networks, named NNH-QL, is proposed for the purpose of avoiding the blindness and limitation of the previous learning methods. The top layer, served as reward shaping function, is constituted by BP neural networks. By shaping, the qualitative top layer provides dynamic online acquired knowledge to instruct the Q-learning based on table. In order to improve the learning efficiency of the qualitative layer, the eligibility traces are incorporated into the BP neural networks training processes. The NNH-QL method combines the flexibility of standard Q-learning and the generalization performance of BP neural networks. All the methods above offer feasible methods to solve reinforcement learning problems in larger state space. For testing, the NNH-QL algorithm is applied to an optimal path search problem. The results show that this algorithm can improve the learning performance and accelerate the learning process obviously.

Catalog

    Article views (1710) PDF downloads (1486) Cited by()
    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return