• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

一种近似模型表示的启发式Dyna优化算法

钟珊, 刘全, 傅启明, 章宗长, 朱斐, 龚声蓉

钟珊, 刘全, 傅启明, 章宗长, 朱斐, 龚声蓉. 一种近似模型表示的启发式Dyna优化算法[J]. 计算机研究与发展, 2015, 52(12): 2764-2775. DOI: 10.7544/issn1000-1239.2015.20148160
引用本文: 钟珊, 刘全, 傅启明, 章宗长, 朱斐, 龚声蓉. 一种近似模型表示的启发式Dyna优化算法[J]. 计算机研究与发展, 2015, 52(12): 2764-2775. DOI: 10.7544/issn1000-1239.2015.20148160
Zhong Shan, Liu Quan, Fu Qiming, Zhang Zongzhang, Zhu Fei, Gong Shengrong. A Heuristic Dyna Optimizing Algorithm Using Approximate Model Representation[J]. Journal of Computer Research and Development, 2015, 52(12): 2764-2775. DOI: 10.7544/issn1000-1239.2015.20148160
Citation: Zhong Shan, Liu Quan, Fu Qiming, Zhang Zongzhang, Zhu Fei, Gong Shengrong. A Heuristic Dyna Optimizing Algorithm Using Approximate Model Representation[J]. Journal of Computer Research and Development, 2015, 52(12): 2764-2775. DOI: 10.7544/issn1000-1239.2015.20148160
钟珊, 刘全, 傅启明, 章宗长, 朱斐, 龚声蓉. 一种近似模型表示的启发式Dyna优化算法[J]. 计算机研究与发展, 2015, 52(12): 2764-2775. CSTR: 32373.14.issn1000-1239.2015.20148160
引用本文: 钟珊, 刘全, 傅启明, 章宗长, 朱斐, 龚声蓉. 一种近似模型表示的启发式Dyna优化算法[J]. 计算机研究与发展, 2015, 52(12): 2764-2775. CSTR: 32373.14.issn1000-1239.2015.20148160
Zhong Shan, Liu Quan, Fu Qiming, Zhang Zongzhang, Zhu Fei, Gong Shengrong. A Heuristic Dyna Optimizing Algorithm Using Approximate Model Representation[J]. Journal of Computer Research and Development, 2015, 52(12): 2764-2775. CSTR: 32373.14.issn1000-1239.2015.20148160
Citation: Zhong Shan, Liu Quan, Fu Qiming, Zhang Zongzhang, Zhu Fei, Gong Shengrong. A Heuristic Dyna Optimizing Algorithm Using Approximate Model Representation[J]. Journal of Computer Research and Development, 2015, 52(12): 2764-2775. CSTR: 32373.14.issn1000-1239.2015.20148160

一种近似模型表示的启发式Dyna优化算法

基金项目: 国家自然科学基金项目(61272005,61303108,61373094,61472262,61502323,61502329);江苏省自然科学基金项目(BK2012616);江苏省高校自然科学研究项目(13KJB520020);吉林大学符号计算与知识工程教育部重点实验室基金项目(93K172014K04);苏州市应用基础研究计划项目(SYG201422)
详细信息
  • 中图分类号: TP181

A Heuristic Dyna Optimizing Algorithm Using Approximate Model Representation

  • 摘要: 针对基于查询表的Dyna优化算法在大规模状态空间中收敛速度慢、环境模型难以表征以及对变化环境的学习滞后性等问题,提出一种新的基于近似模型表示的启发式Dyna优化算法(a heuristic Dyna optimization algorithm using approximate model representation, HDyna-AMR),其利用线性函数近似逼近Q值函数,采用梯度下降方法求解最优值函数.HDyna-AMR算法可以分为学习阶段和规划阶段.在学习阶段,利用agent与环境的交互样本近似表示环境模型并记录特征出现频率;在规划阶段,基于近似环境模型进行值函数的规划学习,并根据模型逼近过程中记录的特征出现频率设定额外奖赏.从理论的角度证明了HDyna-AMR的收敛性.将算法用于扩展的Boyan chain问题和Mountain car问题.实验结果表明,HDyna-AMR在离散状态空间和连续状态空间问题中能学习到最优策略,同时与Dyna-LAPS(Dyna-style planning with linear approximation and prioritized sweeping)和Sarsa(λ)相比,HDyna-AMR具有收敛速度快以及对变化环境的近似模型修正及时的优点.
    Abstract: In allusion to the problems of reinforcement learning with Dyna-framework, such as slow convergence and inappropriate representation of the environment model, delayed learning of the changed environment and so on, this paper proposes a novel heuristic Dyna optimization algorithm based on approximate model—HDyna-AMR, which approximates Q value function via linear function, and solves the optimal value function by using gradient descent method. HDyna-AMR can be divided into two phases, such as the learning phase and the planning phase. In the former one, the algorithm approximately models the environment by interacting with the environment and records the feature appearing frequency, while in the latter one, the approximated environment model can be used to do the planning with some extra rewards according to the feature appearing frequency. Additionally, the paper proves the convergence of the proposed algorithm theoretically. Experimentally, we apply HDyna-AMR to the extended Boyan Chain problem and Mountain Car problem, and the results show that HDyna-AMR can get the approximately optimal policy in both discrete and continuous state space. Furthermore, compared with Dyna-LAPS (Dyna-style planning with linear approximation and prioritized sweeping) and Sarsa(λ), HDyna-AMR outperforms Dyna-LAPS and Sarsa(λ) in terms of convergence rate, and the robustness to the changed environment.
计量
  • 文章访问数:  1574
  • HTML全文浏览量:  5
  • PDF下载量:  554
  • 被引次数: 0
出版历程
  • 发布日期:  2015-11-30

目录

    /

    返回文章
    返回