• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Zhong Shan, Liu Quan, Fu Qiming, Zhang Zongzhang, Zhu Fei, Gong Shengrong. A Heuristic Dyna Optimizing Algorithm Using Approximate Model Representation[J]. Journal of Computer Research and Development, 2015, 52(12): 2764-2775. DOI: 10.7544/issn1000-1239.2015.20148160
Citation: Zhong Shan, Liu Quan, Fu Qiming, Zhang Zongzhang, Zhu Fei, Gong Shengrong. A Heuristic Dyna Optimizing Algorithm Using Approximate Model Representation[J]. Journal of Computer Research and Development, 2015, 52(12): 2764-2775. DOI: 10.7544/issn1000-1239.2015.20148160

A Heuristic Dyna Optimizing Algorithm Using Approximate Model Representation

More Information
  • Published Date: November 30, 2015
  • In allusion to the problems of reinforcement learning with Dyna-framework, such as slow convergence and inappropriate representation of the environment model, delayed learning of the changed environment and so on, this paper proposes a novel heuristic Dyna optimization algorithm based on approximate model—HDyna-AMR, which approximates Q value function via linear function, and solves the optimal value function by using gradient descent method. HDyna-AMR can be divided into two phases, such as the learning phase and the planning phase. In the former one, the algorithm approximately models the environment by interacting with the environment and records the feature appearing frequency, while in the latter one, the approximated environment model can be used to do the planning with some extra rewards according to the feature appearing frequency. Additionally, the paper proves the convergence of the proposed algorithm theoretically. Experimentally, we apply HDyna-AMR to the extended Boyan Chain problem and Mountain Car problem, and the results show that HDyna-AMR can get the approximately optimal policy in both discrete and continuous state space. Furthermore, compared with Dyna-LAPS (Dyna-style planning with linear approximation and prioritized sweeping) and Sarsa(λ), HDyna-AMR outperforms Dyna-LAPS and Sarsa(λ) in terms of convergence rate, and the robustness to the changed environment.
  • Related Articles

    [1]Xiong Xin, Tan Xin, Zhang Yuan. Kernel Refcount Bug Detection Based on the Consistency of Error Path Behavior[J]. Journal of Computer Research and Development, 2023, 60(7): 1489-1500. DOI: 10.7544/issn1000-1239.202220768
    [2]Zhao Xiaolei, Chen Zhaoyun, Shi Yang, Wen Mei, Zhang Chunyuan. Kernel Code Automatic Generation Framework on FT-Matrix[J]. Journal of Computer Research and Development, 2023, 60(6): 1232-1245. DOI: 10.7544/issn1000-1239.202330058
    [3]Hou Pengpeng, Zhang Heng, Wu Yanjun, Yu Jiageng, Tai Yang, Miao Yuxia. Kernel Configuration Infographic Based on Multi-Label and Its Application[J]. Journal of Computer Research and Development, 2021, 58(3): 651-667. DOI: 10.7544/issn1000-1239.2021.20200186
    [4]Yang Hongzhang, Yang Yahui, Tu Yaofeng, Sun Guangyu, Wu Zhonghai. Proactive Fault Tolerance Based on “Collection—Prediction—Migration—Feedback” Mechanism[J]. Journal of Computer Research and Development, 2020, 57(2): 306-317. DOI: 10.7544/issn1000-1239.2020.20190549
    [5]Zhang Liancheng, Wei Qiang, Tang Xiucun, Fang Jiabao. Path and Port Address Hopping Based SDN Proactive Defense Technology[J]. Journal of Computer Research and Development, 2017, 54(12): 2761-2771. DOI: 10.7544/issn1000-1239.2017.20160461
    [6]Yang Bo, Feng Dengguo, Qin Yu, Zhang Qianying, Xi Li, Zheng Changwen. Research on Direct Anonymous Attestation Scheme Based on Trusted Mobile Platform[J]. Journal of Computer Research and Development, 2014, 51(7): 1436-1445.
    [7]Tan Liang, Meng Weiming, Zhou Mingtian. An Improved Direct Anonymous Attestation Scheme[J]. Journal of Computer Research and Development, 2014, 51(2): 334-343.
    [8]Wang Yong, Fang Juan, Ren Xingtian, and Lin Li. Formal Verification of TCG Remote Attestation Protocols Based on Process Algebra[J]. Journal of Computer Research and Development, 2013, 50(2): 325-331.
    [9]Wang Qi'an and Chen Bing. Intrusion Detection System Using CVM Algorithm with Extensive Kernel Methods[J]. Journal of Computer Research and Development, 2012, 49(5): 974-982.
    [10]Huang Wei, Zhan Jianfeng, Fan Jianpin. DCFT-Kernel: A Fault-Tolerant Cluster Middleware Based on Group Service[J]. Journal of Computer Research and Development, 2005, 42(6): 993-999.

Catalog

    Article views (1574) PDF downloads (554) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return