高级检索
    刘 全 闫其粹 伏玉琛 胡道京 龚声蓉. 一种基于启发式奖赏函数的分层强化学习方法[J]. 计算机研究与发展, 2011, 48(12): 2352-2358.
    引用本文: 刘 全 闫其粹 伏玉琛 胡道京 龚声蓉. 一种基于启发式奖赏函数的分层强化学习方法[J]. 计算机研究与发展, 2011, 48(12): 2352-2358.
    Liu Quan, Yan Qicui, Fu Yuchen, Hu Daojing, and Gong Shengrong. A Hierarchical Reinforcement Learning Method Based on Heuristic Reward Function[J]. Journal of Computer Research and Development, 2011, 48(12): 2352-2358.
    Citation: Liu Quan, Yan Qicui, Fu Yuchen, Hu Daojing, and Gong Shengrong. A Hierarchical Reinforcement Learning Method Based on Heuristic Reward Function[J]. Journal of Computer Research and Development, 2011, 48(12): 2352-2358.

    一种基于启发式奖赏函数的分层强化学习方法

    A Hierarchical Reinforcement Learning Method Based on Heuristic Reward Function

    • 摘要: 针对强化学习在应用中经常出现的“维数灾”问题,即状态空间的大小随着特征数量的增加而发生指数级的增长,以及收敛速度过慢的问题,提出了一种基于启发式奖赏函数的分层强化学习方法.该方法不仅能够大幅度减少环境状态空间,还能加快学习的收敛速度.将此算法应用到俄罗斯方块的仿真平台中,通过对实验中的参数进行设置及对算法性能进行分析,结果表明:采用启发式奖赏函数的分层强化学习方法能在一定程度上解决“维数灾”问题,并具有很好的收敛速度.

       

      Abstract: Reinforcement learning is about controlling an autonomous agent in an unknown enviroment—often called the state space. The agent has no prior knowledge about the environment and can only obtain some knowledge by acting in the environment. Reinforcement learning, and Q-learning particularly, encounters a major problem. Learning the Q-function in tablular form may be infeasible because the amount of memory needed to store the table is excessive, and the Q-function converges only after each state being visited a lot of times. So “curse of dimensionality” is inevitably produced by large state spaces. A hierarchical reinforcement learning method based on heuristic reward function is proposed to solve the problem of “curse of dimensionality”, which make the states space grow exponentially by the number of features and slow down the convergence speed. The method can reduce state spaces greatly and quicken the speed of the study. Actions are chosen with favorable purpose and efficiency so as to optimize the reward function and quicken the convergence speed. The Tetris game is applied in the method. Analysis of algorithms and the experiment result show that the method can partly solve the “curse of dimensionality” and quicken the convergence speed prominently.

       

    /

    返回文章
    返回