基于内在动机的深度强化学习探索方法综述

曾俊杰; 秦龙; 徐浩添; 张琪; 胡越; 尹全军

doi:10.7544/issn1000-1239.202220388

基于内在动机的深度强化学习探索方法综述

Exploration Approaches in Deep Reinforcement Learning Based on Intrinsic Motivation: A Review

摘要

摘要: 近年来，深度强化学习(deep reinforcement learning, DRL)在游戏人工智能、机器人等领域取得了诸多重要成就. 然而，在具有稀疏奖励、随机噪声等特性的现实应用场景中，该类方法面临着状态动作空间探索困难的问题. 基于内在动机的深度强化学习探索方法是解决上述问题的一种重要思想. 首先解释了深度强化学习探索困难的问题内涵，介绍了3种经典探索方法，并讨论了这3种方法在高维或连续场景下的局限性；接着描述了内在动机引入深度强化学习的背景和算法模型的常用测试环境，在此基础上详细梳理各类探索方法的基本原理、优势和缺陷，包括基于计数、基于知识和基于能力3类方法；然后介绍了基于内在动机的深度强化学习技术在不同领域的应用情况；最后总结亟需解决的难以构建有效状态表示等关键问题以及结合表示学习、知识积累等领域方向的研究展望.

Abstract: In recent years, deep reinforcement learning has made many important achievements in game artificial intelligence, robotics and other fields. However, in the realistic application scenarios with sparse rewards and random noises, such methods are suffering much from exploring the large state-action space. Introducing the notion of intrinsic motivation from psychology into deep reinforcement learning is an important idea to solve the above problem. Firstly, the connotation of the difficulty of exploration in deep reinforcement learning is explained, and three classical exploration methods are introduced, and their limitations in high-dimensional or continuous scenarios are discussed. Secondly, the background of the introduction of intrinsic motivation into deep reinforcement learning and the common testing environments of algorithms and models are described. On this basis, the basic principles, advantages and disadvantages of various exploration methods are analyzed in detail, including count-based, knowledge-based and competency-based approaches. Then, the applications of deep reinforcement learning based on intrinsic motivation in different fields are introduced. Finally, this paper throws light on the key problems that need to be solved for more advanced algorithms, such as the difficulty in constructing effective state representation, and also pinpoints some prospective research directions such as representation learning and knowledge accumulation. Hopefully, this review can provide readers with guidance of designing suitable intrinsic rewards for problems in hand and devising more effective exploration algorithms.

HTML全文

参考文献(145)

施引文献

资源附件(0)