Abstract:
As a common temporal abstraction method for hierarchical reinforcement learning, Option framework allows agents to learn strategies at different time scales, which can effectively solve sparse reward problems. In order to ensure that options can guide agents to access more state space, some methods improve the diversity of options by introducing mutual information in internal reward and termination functions. However, it will lead to slow algorithm learning speed and low knowledge transfer ability of internal strategy, which seriously affect algorithm performance. To address the above problems, diversity-enriched option-critic algorithm with interest functions(DEOC-IF) is proposed. Based on the diversity-enriched option-critic (DEOC) algorithm, the algorithm constrains the selection of the upper-level strategy on the internal strategy of Option by introducing the interest function, which not only ensures the diversity of the Option set, but also makes the learned internal strategies can focus on different regions of the state space, which is conducive to improving the knowledge transfer ability of the algorithm and accelerating the learning speed. In addition, DEOC-IF introduces a new interest function update gradient, which is beneficial to improve the exploration ability of the algorithm. In order to verify the effectiveness and option reusability of the algorithm, the algorithm comparison experiments were carried out in Four-Rooms Navigation Task, Mujoco, and MiniWorld. Experimental results show that DEOC-IF algorithm has better performance and option reusability compared with other algorithms.