高级检索
    林 芬, 石 川, 罗杰文, 史忠植. 基于偏向信息学习的双层强化学习算法[J]. 计算机研究与发展, 2008, 45(9): 1455-1462.
    引用本文: 林 芬, 石 川, 罗杰文, 史忠植. 基于偏向信息学习的双层强化学习算法[J]. 计算机研究与发展, 2008, 45(9): 1455-1462.
    Lin Fen, Shi Chuan, Luo Jiewen, Shi Zhongzhi. Dual Reinforcement Learning Based on Bias Learning[J]. Journal of Computer Research and Development, 2008, 45(9): 1455-1462.
    Citation: Lin Fen, Shi Chuan, Luo Jiewen, Shi Zhongzhi. Dual Reinforcement Learning Based on Bias Learning[J]. Journal of Computer Research and Development, 2008, 45(9): 1455-1462.

    基于偏向信息学习的双层强化学习算法

    Dual Reinforcement Learning Based on Bias Learning

    • 摘要: 传统的强化学习存在收敛速度慢等问题,结合先验知识预置某些偏向可以加快学习速度.但是当先验知识不正确时又可能导致学习过程不收敛.对此,提出基于偏向信息学习的双层强化学习模型.该模型将强化学习过程和偏向信息学习过程结合起来:偏向信息指导强化学习的行为选择策略,同时强化学习指导偏向信息学习过程.该方法在有效利用先验知识的同时能够消除不正确先验知识的影响.针对迷宫问题的实验表明,该方法能够稳定收敛到最优策略;并且能够有效利用先验知识提高学习效率,加快学习过程的收敛.

       

      Abstract: Reinforcement learning has received much attention in the past decade. Its incremental nature and adaptive capabilities make it suitable for use in various domains, such as automatic control, mobile robotics and multi-agent system. A critical problem in conventional reinforcement learning is the slow convergence of the learning process. To accelerate the learning speed, bias information is incorporated to boost learning process with priori knowledge. Current methods use bias information for the action selection strategies in reinforcement learning. They may suffer from the non-convergence problem when priori knowledge is incorrect. A dual reinforcement learning model based on bias learning is proposed, which integrates reinforcement learning process and bias learning process. Bias information is used for action selection strategies in reinforcement learning and reinforcement learning is used to guide bias learning process. Thus the dual reinforcement learning model could make effective use of priori knowledge, and eliminate the negative effects of incorrect priori knowledge. Finally, the proposed dual model is validated by experiment on maze problem including simple environment and complex environment. The experimental results demonstrate that the model could converge to the optimal strategy steadily. Moreover, the model could improve the learning performance and speed up the convergence of the learning process.

       

    /

    返回文章
    返回