Abstract:
Aiming at the “curse of dimensionality” problem of reinforcement learning in large state space or continuous state space, a scalable reinforcement learning method, IS-SRL method, is proposed on the basis of divide-and-conquer strategy, and its convergence is proved. In this method, the learning problem with large state space or continuous state space is divided into smaller subproblems so that each subproblem can be learned independently in memory. After a cycle of learning, next subproblem will be swapped in to continue the learning process. Information exchanges between the subproblems during the process of swap so that the learning process will converge to optima eventually. The order of subproblems’ executing significantly affects the efficiency of learning. Therefore, we propose an efficient scheduling algorithm which takes advantage of the distribution of value function’s backup in reinforcement learning and the idea of weighting the priorities of multiple scheduling strategies. This scheduling algorithm ensures that computation is focused on regions of the problem space which are expected to be maximally productive. To expedite the learning process, a parallel scheduling architecture, which can flexibly allocate learning tasks between learning agents, is proposed. A new method, IS-SPRL, is obtained after we blended the proposed architecture into the IS-SRL method. The experimental results show that learning based on this scheduling architecture has faster convergence speed and good scalability.