ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2019, Vol. 56 ›› Issue (2): 254-261.doi: 10.7544/issn1000-1239.2019.20170578

Previous Articles     Next Articles

Methodologies for Imitation Learning via Inverse Reinforcement Learning: A Review

Zhang Kaifeng, Yu Yang   

  1. (State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023)
  • Online:2019-02-01

Abstract: Motivated by applying reinforcement learning methods into autonomous robotic systems and complex decision making problems, reinforcement learning is becoming more and more popular in the community of machine learning. Traditional reinforcement learning is one kind of learning paradigm in machine learning field which is learning from the interactions between the agent and the environment. However, for the vast majority of cases, the environments for sequential decision making problems cannot provide an explicit reward signal immediately or the reward signal can be much delayed. This becomes the bottleneck for applying reinforcement learning methods into more complex tasks. So inverse reinforcement learning is proposed to recover the reward function from expert demonstrations in the Markov decision process (MDP) by assuming that the expert demonstrations is optimal. So far, the imitation learning algorithms which combines direct reinforcement learning approaches and inverse reinforcement learning approaches have already made a great progress. This paper briefly introduces the basic concepts of reinforcement learning, inverse reinforcement learning and imitation learning. And this paper also gives an introduction to the existing problems concerning with inverse reinforcement learning and some other methods in imitation learning. In addition, we also introduce some existing bottlenecks once applying the above methods into real world applications.

Key words: reinforcement learning, imitation learning, inverse reinforcement learning, Markov decision process, multi-step decision problem

CLC Number: