An Optimized Dyna Architecture Algorithm with Prioritized Sweeping

Sun Hongkun; Liu Quan; Fu Qiming; Xiao Fei; Gao Long

Sun Hongkun, Liu Quan, Fu Qiming, Xiao Fei, Gao Long. An Optimized Dyna Architecture Algorithm with Prioritized Sweeping[J]. Journal of Computer Research and Development, 2013, 50(10): 2176-2184.

Citation:

Sun Hongkun, Liu Quan, Fu Qiming, Xiao Fei, Gao Long. An Optimized Dyna Architecture Algorithm with Prioritized Sweeping[J]. Journal of Computer Research and Development, 2013, 50(10): 2176-2184.

Citation:

Sun Hongkun, Liu Quan, Fu Qiming, Xiao Fei, Gao Long. An Optimized Dyna Architecture Algorithm with Prioritized Sweeping[J]. Journal of Computer Research and Development, 2013, 50(10): 2176-2184.

An Optimized Dyna Architecture Algorithm with Prioritized Sweeping

Graphical Abstract

Abstract

Abstract

Reinforcement learning involves sequential decision making in model-free environments. The aim of the agent is to maximize the accumulated reward of acting in its environment over an extended period of time. Finding the optimal policy in direct RL may be very slow. To speed up converging, one often-used solution is the integration of learning with planning. In order to further improve the convergence time and convergence precision of the Dyna structure algorithm, an optimized Dyna structure algorithm with prioritized sweeping named Dyna-PS is proposed, and its proof of convergence in theory is given. The key idea of Dyna-PS is integrating prioritized sweeping method in Dyna architecture so as to update the states according to their priority functions in the planning part. Moreover, it omits the insignificant and unrelated states' updating which are often updated in traditional value iteration and policy iteration. Achieved experiment results show that the Dyna-PS algorithm has better convergence performance and robustness for state space growth when it is applied to the maze experiment scenario and a series of classical AI programming problems.

FullText(HTML)

References (0)

Supplements (0)

Cited By

Turn off MathJax

Article Contents

An Optimized Dyna Architecture Algorithm with Prioritized Sweeping

Abstract

Catalog

Export File

Citation

Format

Content