基于路径匹配的在线分层强化学习方法

石  川; 史忠植; 王茂光

基于路径匹配的在线分层强化学习方法

Online Hierarchical Reinforcement Learning Based on Path-matching

摘要

摘要: 如何在线找到正确的子目标是基于option的分层强化学习的关键问题.通过分析学习主体在子目标处的动作，发现了子目标的有效动作受限的特性，进而将寻找子目标的问题转化为寻找路径中最匹配的动作受限状态.针对网格学习环境，提出了单向值方法表示子目标的有效动作受限特性和基于此方法的option自动发现算法.实验表明，基于单向值方法产生的option能够显著加快Q学习算法，也进一步分析了option产生的时机和大小对Q学习算法性能的影响.

Abstract: Although reinforcement learning （RL） is an effective approach for building autonomous agents that improve their performance with experiences, a fundamental problem of the standard RL algorithm is that in practice they are not solvable in reasonable time. The hierarchical reinforcement learning （HRL） is a successful solution which decomposes the learning task into simpler subtasks and learns each of them independently. As a promising HRL, option is introduced as closed-loop policies for sequences of actions to enable HRL. A key problem for HRL based on options is to discover the correct subgoals online. Through analyzing the actions of agents in subgoals, two useful properties are found: （1） the subgoals have more possibility to be passed through and （2） the effective actions in subgoals are restricted. As a consequence, subgoals can be regarded as the most matching action-restricted states in the paths. Considering the grid environment, the concept of unique-direction value is proposed to denote the action-restricted property, and the option discovering algorithm based on unique-direction value is introduced. The experiments show that the options discovered by the unique-direction value method can speed up the primitive Q learning significantly. Moreover, the experiments further analyze how the size and generating time of options affects the performance of Q learning.

HTML全文

参考文献(0)

施引文献

资源附件(0)