高级检索
    刘智斌, 曾晓勤, 刘惠义, 储荣. 基于BP神经网络的双层启发式强化学习方法[J]. 计算机研究与发展, 2015, 52(3): 579-587. DOI: 10.7544/issn1000-1239.2015.20131270
    引用本文: 刘智斌, 曾晓勤, 刘惠义, 储荣. 基于BP神经网络的双层启发式强化学习方法[J]. 计算机研究与发展, 2015, 52(3): 579-587. DOI: 10.7544/issn1000-1239.2015.20131270
    Liu Zhibin, Zeng Xiaoqin, Liu Huiyi, Chu Rong. A Heuristic Two-layer Reinforcement Learning Algorithm Based on BP Neural Networks[J]. Journal of Computer Research and Development, 2015, 52(3): 579-587. DOI: 10.7544/issn1000-1239.2015.20131270
    Citation: Liu Zhibin, Zeng Xiaoqin, Liu Huiyi, Chu Rong. A Heuristic Two-layer Reinforcement Learning Algorithm Based on BP Neural Networks[J]. Journal of Computer Research and Development, 2015, 52(3): 579-587. DOI: 10.7544/issn1000-1239.2015.20131270

    基于BP神经网络的双层启发式强化学习方法

    A Heuristic Two-layer Reinforcement Learning Algorithm Based on BP Neural Networks

    • 摘要: 强化学习通过与环境交互的方式进行学习,在较大状态空间中其学习效率却很低.植入先验知识能够提高学习速度,然而不恰当的先验知识反而会误导学习过程,对学习性能不利.提出一种基于BP神经网络的双层启发式强化学习方法NNH-QL,改变了传统强化学习过程的盲目性.作为定性层,高层由BP神经网络构成,它不需要由外界提供背景知识,利用Shaping技术,将在线获取的动态知识对底层基于表格的Q学习过程进行趋势性启发.算法利用资格迹技术训练神经网络以提高学习效率.NNH-QL方法既发挥了标准Q学习的灵活性,又利用了神经网络的泛化性能,为解决较大状态空间下的强化学习问题提供了一个可行的方法.实验结果表明:该方法能够较好地提高强化学习的性能且具有明显的加速效果.

       

      Abstract: Reinforcement learning is a promising learning approach for agent to interact with environment from repeated training. However, it is bedeviled by the curse of dimensionality so that it can be hardly applied to large scale problems due to its low efficiency. Imbedding static prior knowledge can improve the learning performance of reinforcement learning, but inappropriate knowledge often misguides the learning process or reduces the learning speed. In this paper, an online heuristic two-layer reinforcement learning algorithm based on BP neural networks, named NNH-QL, is proposed for the purpose of avoiding the blindness and limitation of the previous learning methods. The top layer, served as reward shaping function, is constituted by BP neural networks. By shaping, the qualitative top layer provides dynamic online acquired knowledge to instruct the Q-learning based on table. In order to improve the learning efficiency of the qualitative layer, the eligibility traces are incorporated into the BP neural networks training processes. The NNH-QL method combines the flexibility of standard Q-learning and the generalization performance of BP neural networks. All the methods above offer feasible methods to solve reinforcement learning problems in larger state space. For testing, the NNH-QL algorithm is applied to an optimal path search problem. The results show that this algorithm can improve the learning performance and accelerate the learning process obviously.

       

    /

    返回文章
    返回