ISSN 1000-1239 CN 11-1777/TP

• Paper •

### An Optimized Dyna Architecture Algorithm with Prioritized Sweeping

Sun Hongkun1, Liu Quan1,2, Fu Qiming1, Xiao Fei1, and Gao Long1

1. 1(School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215006) 2(Key Laboratory of Symbol Computation and Knowledge Engineering(Jilin University), Ministry of Education, Changchun 130012)
• Online:2013-10-15

Abstract: Reinforcement learning involves sequential decision making in model-free environments. The aim of the agent is to maximize the accumulated reward of acting in its environment over an extended period of time. Finding the optimal policy in direct RL may be very slow. To speed up converging, one often-used solution is the integration of learning with planning. In order to further improve the convergence time and convergence precision of the Dyna structure algorithm, an optimized Dyna structure algorithm with prioritized sweeping named Dyna-PS is proposed, and its proof of convergence in theory is given. The key idea of Dyna-PS is integrating prioritized sweeping method in Dyna architecture so as to update the states according to their priority functions in the planning part. Moreover, it omits the insignificant and unrelated states' updating which are often updated in traditional value iteration and policy iteration. Achieved experiment results show that the Dyna-PS algorithm has better convergence performance and robustness for state space growth when it is applied to the maze experiment scenario and a series of classical AI programming problems.