高级检索
    刘 全, 高 阳, 陈道蓄, 孙吉贵, 姚望舒. 一种基于启发式轮廓表的逻辑强化学习方法[J]. 计算机研究与发展, 2008, 45(11): 1824-1830.
    引用本文: 刘 全, 高 阳, 陈道蓄, 孙吉贵, 姚望舒. 一种基于启发式轮廓表的逻辑强化学习方法[J]. 计算机研究与发展, 2008, 45(11): 1824-1830.
    Liu Quan, Gao Yang, Chen Daoxu, Sun Jigui, Yao Wangshu. A Logical Reinforcement Learning Method Based on Heuristic Contour List[J]. Journal of Computer Research and Development, 2008, 45(11): 1824-1830.
    Citation: Liu Quan, Gao Yang, Chen Daoxu, Sun Jigui, Yao Wangshu. A Logical Reinforcement Learning Method Based on Heuristic Contour List[J]. Journal of Computer Research and Development, 2008, 45(11): 1824-1830.

    一种基于启发式轮廓表的逻辑强化学习方法

    A Logical Reinforcement Learning Method Based on Heuristic Contour List

    • 摘要: 强化学习通过试错与环境交互获得策略的改进,其自学习和在线学习的特点使其成为机器学习研究的一个重要分支.针对强化学习一直被“维数灾”问题所困扰的问题,提出在关系强化学习的基础上,引入启发式轮廓表的方法,采用含轮廓表的一阶谓词表示状态、活动和Q-函数,充分发挥Prolog表的优势,将逻辑谓词规则与强化学习相结合,形成一种新的逻辑强化学习方法——CCLORRL,并对其收敛性进行了证明.该方法使用轮廓形状谓词产生形状状态表,大幅度地减少状态空间;利用启发式规则指导动作的选择,减少了样本中不存在状态选择的盲目性.CCLORRL算法应用于俄罗斯方块中,实验表明,该方法是比较高效的.

       

      Abstract: Reinforcement learning gets optimal policy through trial-and-error and interaction with dynamic environment. Its properties of self-improving and online learning make reinforcement learning become one of most important machine learning methods. Against reinforcement learning has been “curse of dimensionality” troubled by the problem the question, a method of heuristic contour list is proposed on the basis of relational reinforcement learning. The method can represent states, actions and Q-functions through using first-order predications with contour list. Thus advantages of Prolog list can be exerted adequately. The method is to combine logical predication rule with reinforcement learning. A new logical reinforcement learning—CCLORRL is formed and its convergence is proved. The method uses contour shape predicates to build shape state tables, drastically reducing the state space; Using heuristic rules to guide the choice of action can reduce choice blindness when the sample does not exist in the state space. The CCLORRL algorithm is used in the Tetris game. Experiments show that the method is more efficient.

       

    /

    返回文章
    返回