基于逆强化学习的示教学习方法综述

张凯峰; 俞扬

doi:10.7544/issn1000-1239.2019.20170578

基于逆强化学习的示教学习方法综述

张凯峰,
俞扬

Methodologies for Imitation Learning via Inverse Reinforcement Learning: A Review

摘要

摘要: 随着强化学习在自动机器人控制、复杂决策问题上的广泛应用，强化学习逐渐成为机器学习领域中的一大研究热点.传统强化学习算法是一种通过不断与所处环境进行自主交互并从中得到策略的学习方式.然而，大多数多步决策问题难以给出传统强化学习所需要的反馈信号.这逐渐成为强化学习在更多复杂问题中实现应用的瓶颈.逆强化学习是基于专家决策轨迹最优的假设，在马尔可夫决策过程中逆向求解反馈函数的一类算法.目前，通过将逆强化学习和传统正向强化学习相结合设计的一类示教学习算法已经在机器人控制等领域取得了一系列成果.对强化学习、逆强化学习以及示教学习方法做一定介绍，此外还介绍了逆强化学习在应用过程中所需要解决的问题以及基于逆强化学习的示教学习方法.

Abstract: Motivated by applying reinforcement learning methods into autonomous robotic systems and complex decision making problems, reinforcement learning is becoming more and more popular in the community of machine learning. Traditional reinforcement learning is one kind of learning paradigm in machine learning field which is learning from the interactions between the agent and the environment. However, for the vast majority of cases, the environments for sequential decision making problems cannot provide an explicit reward signal immediately or the reward signal can be much delayed. This becomes the bottleneck for applying reinforcement learning methods into more complex tasks. So inverse reinforcement learning is proposed to recover the reward function from expert demonstrations in the Markov decision process (MDP) by assuming that the expert demonstrations is optimal. So far, the imitation learning algorithms which combines direct reinforcement learning approaches and inverse reinforcement learning approaches have already made a great progress. This paper briefly introduces the basic concepts of reinforcement learning, inverse reinforcement learning and imitation learning. And this paper also gives an introduction to the existing problems concerning with inverse reinforcement learning and some other methods in imitation learning. In addition, we also introduce some existing bottlenecks once applying the above methods into real world applications.

HTML全文

参考文献(0)

施引文献

资源附件(0)