ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2019, Vol. 56 ›› Issue (2): 254-261.doi: 10.7544/issn1000-1239.2019.20170578

• 综述 • 上一篇    下一篇

基于逆强化学习的示教学习方法综述

张凯峰,俞扬   

  1. (计算机软件新技术国家重点实验室(南京大学) 南京 210023) (zhangkf@lamda.nju.edu.cn)
  • 出版日期: 2019-02-01
  • 基金资助: 
    江苏省自然科学基金项目(BK20160066)

Methodologies for Imitation Learning via Inverse Reinforcement Learning: A Review

Zhang Kaifeng, Yu Yang   

  1. (State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023)
  • Online: 2019-02-01

摘要: 随着强化学习在自动机器人控制、复杂决策问题上的广泛应用,强化学习逐渐成为机器学习领域中的一大研究热点.传统强化学习算法是一种通过不断与所处环境进行自主交互并从中得到策略的学习方式.然而,大多数多步决策问题难以给出传统强化学习所需要的反馈信号.这逐渐成为强化学习在更多复杂问题中实现应用的瓶颈.逆强化学习是基于专家决策轨迹最优的假设,在马尔可夫决策过程中逆向求解反馈函数的一类算法.目前,通过将逆强化学习和传统正向强化学习相结合设计的一类示教学习算法已经在机器人控制等领域取得了一系列成果.对强化学习、逆强化学习以及示教学习方法做一定介绍,此外还介绍了逆强化学习在应用过程中所需要解决的问题以及基于逆强化学习的示教学习方法.

关键词: 强化学习, 示教学习, 逆强化学习, 马尔可夫决策过程, 多步决策问题

Abstract: Motivated by applying reinforcement learning methods into autonomous robotic systems and complex decision making problems, reinforcement learning is becoming more and more popular in the community of machine learning. Traditional reinforcement learning is one kind of learning paradigm in machine learning field which is learning from the interactions between the agent and the environment. However, for the vast majority of cases, the environments for sequential decision making problems cannot provide an explicit reward signal immediately or the reward signal can be much delayed. This becomes the bottleneck for applying reinforcement learning methods into more complex tasks. So inverse reinforcement learning is proposed to recover the reward function from expert demonstrations in the Markov decision process (MDP) by assuming that the expert demonstrations is optimal. So far, the imitation learning algorithms which combines direct reinforcement learning approaches and inverse reinforcement learning approaches have already made a great progress. This paper briefly introduces the basic concepts of reinforcement learning, inverse reinforcement learning and imitation learning. And this paper also gives an introduction to the existing problems concerning with inverse reinforcement learning and some other methods in imitation learning. In addition, we also introduce some existing bottlenecks once applying the above methods into real world applications.

Key words: reinforcement learning, imitation learning, inverse reinforcement learning, Markov decision process, multi-step decision problem

中图分类号: