高级检索
    赵凤飞 覃 征. 一种多动机强化学习框架[J]. 计算机研究与发展, 2013, 50(2): 240-247.
    引用本文: 赵凤飞 覃 征. 一种多动机强化学习框架[J]. 计算机研究与发展, 2013, 50(2): 240-247.
    Zhao Fengfei and Qin Zheng. A Multi-Motive Reinforcement Learning Framework[J]. Journal of Computer Research and Development, 2013, 50(2): 240-247.
    Citation: Zhao Fengfei and Qin Zheng. A Multi-Motive Reinforcement Learning Framework[J]. Journal of Computer Research and Development, 2013, 50(2): 240-247.

    一种多动机强化学习框架

    A Multi-Motive Reinforcement Learning Framework

    • 摘要: 以Q学习为代表的传统强化学习方法都是维持一个状态与动作的映射表.这种状态-动作的二层映射结构缺乏灵活性,同时不能有效地使用先验知识引导学习过程.为了解决这一问题,提出了一种基于多动机强化学习(MMRL)的框架.MMRL框架在状态与动作间引入动机层,将原有的状态-动作二层结构扩展为状态-动机-动作三层结构,可根据经验设置多个动机.通过动机的设定实现了先验知识的利用,进而加快了强化学习的进程,提高了强化学习的灵活性.实验表明,通过合理的动机设定,多动机强化学习的学习速度较传统强化学习有明显提升.

       

      Abstract: The traditional reinforcement learning methods such as Q-learning, maintain a table that maps the states to the actions. This simple dual-layer mapping structure has been widely used in many applied situations. However, dual-layer mapping structure of state-action lacks flexibility, while priori knowledge can not be effectively used to guide the learning process. To solve this problem, a new reinforcement learning framework is proposed, called multi-motive reinforcement learning (MMRL). Between state layer and action layer, MMRL framework introduces motive layer, in which multiple motives can be set based on experience. In this way, the original state-action dual-layer structure is extended to state-motive-action triple-layer structure. Under this framework, two new corresponding algorithms are presented, the first is MMQ-unique algorithm and the second is MMQ-voting algorithm. Moreover, it is stated that traditional reinforcement learning methods can be seen as a degenerate form of multi-motive reinforcement learning. That is to say, multi-motive reinforcement learning framework is a superset of traditional methods. This new framework and the corresponding algorithms improve the flexibility of reinforcement learning by adding the motive layer, and make use of priori knowledge to speed up the learning process. Experiments demonstrate that, multi-motive reinforcement learning can get better performance than the traditional reinforcement learning methods significantly by setting reasonable motives.

       

    /

    返回文章
    返回