Online Hierarchical Reinforcement Learning Based on Path-matching

Shi Chuan; Shi Zhongzhi; Wang Maoguang

Shi Chuan, Shi Zhongzhi, Wang Maoguang. Online Hierarchical Reinforcement Learning Based on Path-matching[J]. Journal of Computer Research and Development, 2008, 45(9).

Citation:

Shi Chuan, Shi Zhongzhi, Wang Maoguang. Online Hierarchical Reinforcement Learning Based on Path-matching[J]. Journal of Computer Research and Development, 2008, 45(9).

Citation:

Shi Chuan, Shi Zhongzhi, Wang Maoguang. Online Hierarchical Reinforcement Learning Based on Path-matching[J]. Journal of Computer Research and Development, 2008, 45(9).

Online Hierarchical Reinforcement Learning Based on Path-matching

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Although reinforcement learning （RL） is an effective approach for building autonomous agents that improve their performance with experiences, a fundamental problem of the standard RL algorithm is that in practice they are not solvable in reasonable time. The hierarchical reinforcement learning （HRL） is a successful solution which decomposes the learning task into simpler subtasks and learns each of them independently. As a promising HRL, option is introduced as closed-loop policies for sequences of actions to enable HRL. A key problem for HRL based on options is to discover the correct subgoals online. Through analyzing the actions of agents in subgoals, two useful properties are found: （1） the subgoals have more possibility to be passed through and （2） the effective actions in subgoals are restricted. As a consequence, subgoals can be regarded as the most matching action-restricted states in the paths. Considering the grid environment, the concept of unique-direction value is proposed to denote the action-restricted property, and the option discovering algorithm based on unique-direction value is introduced. The experiments show that the options discovered by the unique-direction value method can speed up the primitive Q learning significantly. Moreover, the experiments further analyze how the size and generating time of options affects the performance of Q learning.

FullText(HTML)

References (0)

Cited By

Turn off MathJax

Article Contents

Online Hierarchical Reinforcement Learning Based on Path-matching

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content