ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2015, Vol. 52 ›› Issue (3): 579-587.doi: 10.7544/issn1000-1239.2015.20131270

Previous Articles     Next Articles

A Heuristic Two-layer Reinforcement Learning Algorithm Based on BP Neural Networks

Liu Zhibin1,2, Zeng Xiaoqin2, Liu Huiyi2, Chu Rong2   

  1. 1(School of Information Science and Engineering, Qufu Normal University, Rizhao, Shandong 276826); 2(College of Computer and Information, Hohai University, Nanjing 210098)
  • Online:2015-03-01

Abstract: Reinforcement learning is a promising learning approach for agent to interact with environment from repeated training. However, it is bedeviled by the curse of dimensionality so that it can be hardly applied to large scale problems due to its low efficiency. Imbedding static prior knowledge can improve the learning performance of reinforcement learning, but inappropriate knowledge often misguides the learning process or reduces the learning speed. In this paper, an online heuristic two-layer reinforcement learning algorithm based on BP neural networks, named NNH-QL, is proposed for the purpose of avoiding the blindness and limitation of the previous learning methods. The top layer, served as reward shaping function, is constituted by BP neural networks. By shaping, the qualitative top layer provides dynamic online acquired knowledge to instruct the Q-learning based on table. In order to improve the learning efficiency of the qualitative layer, the eligibility traces are incorporated into the BP neural networks training processes. The NNH-QL method combines the flexibility of standard Q-learning and the generalization performance of BP neural networks. All the methods above offer feasible methods to solve reinforcement learning problems in larger state space. For testing, the NNH-QL algorithm is applied to an optimal path search problem. The results show that this algorithm can improve the learning performance and accelerate the learning process obviously.

Key words: NNH-QL, reinforcement learning, Q-learning, neural networks, path planning

CLC Number: