Lin Fen, Shi Chuan, Luo Jiewen, Shi Zhongzhi. Dual Reinforcement Learning Based on Bias Learning[J]. Journal of Computer Research and Development, 2008, 45(9): 1455-1462.
Citation:
Lin Fen, Shi Chuan, Luo Jiewen, Shi Zhongzhi. Dual Reinforcement Learning Based on Bias Learning[J]. Journal of Computer Research and Development, 2008, 45(9): 1455-1462.
Lin Fen, Shi Chuan, Luo Jiewen, Shi Zhongzhi. Dual Reinforcement Learning Based on Bias Learning[J]. Journal of Computer Research and Development, 2008, 45(9): 1455-1462.
Citation:
Lin Fen, Shi Chuan, Luo Jiewen, Shi Zhongzhi. Dual Reinforcement Learning Based on Bias Learning[J]. Journal of Computer Research and Development, 2008, 45(9): 1455-1462.
1(Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Beijing 100190) 2(Graduate University of Chinese Academy of Sciences, Beijing 100049) 3(Smart Software and Multimedia of Beijing Key Laboratory, Beijing University of Posts and Telecommunications, Beijing 100876)
Reinforcement learning has received much attention in the past decade. Its incremental nature and adaptive capabilities make it suitable for use in various domains, such as automatic control, mobile robotics and multi-agent system. A critical problem in conventional reinforcement learning is the slow convergence of the learning process. To accelerate the learning speed, bias information is incorporated to boost learning process with priori knowledge. Current methods use bias information for the action selection strategies in reinforcement learning. They may suffer from the non-convergence problem when priori knowledge is incorrect. A dual reinforcement learning model based on bias learning is proposed, which integrates reinforcement learning process and bias learning process. Bias information is used for action selection strategies in reinforcement learning and reinforcement learning is used to guide bias learning process. Thus the dual reinforcement learning model could make effective use of priori knowledge, and eliminate the negative effects of incorrect priori knowledge. Finally, the proposed dual model is validated by experiment on maze problem including simple environment and complex environment. The experimental results demonstrate that the model could converge to the optimal strategy steadily. Moreover, the model could improve the learning performance and speed up the convergence of the learning process.