• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Wu Jinjin, Liu Quan, Chen Song, Yan Yan. Averaged Weighted Double Deep Q-Network[J]. Journal of Computer Research and Development, 2020, 57(3): 576-589. DOI: 10.7544/issn1000-1239.2020.20190159
Citation: Wu Jinjin, Liu Quan, Chen Song, Yan Yan. Averaged Weighted Double Deep Q-Network[J]. Journal of Computer Research and Development, 2020, 57(3): 576-589. DOI: 10.7544/issn1000-1239.2020.20190159

Averaged Weighted Double Deep Q-Network

Funds: This work was supported by the National Natural Science Foundation of China (61772355, 61702055, 61502323, 61502329), the Jiangsu Provincial Natural Science Research University Major Projects (18KJA520011, 17KJA520004), the Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education (Jilin University) (93K172014K04, 93K172017K18), the Suzhou Industrial Application of Basic Research Program (SYG201422), and the Priority Academic Program Development of Jiangsu Higher Education Institutions.
More Information
  • Published Date: February 29, 2020
  • The instability and variability of deep reinforcement learning algorithms have an important effect on their performance. Deep Q-Network is the first algorithm to combine deep neural networks with Q-learning successfully. It is proved that deep Q-Network can perform human-level control for handling problems requiring both rich perception of high-dimensional raw inputs and policy control. However, deep Q-Network has the problem of overestimating the action value and such overestimation can degrade the performance of agent. Although double deep Q-Network is proposed to mitigate the impact of overestimation, it still exists the problem of underestimating the value of the action. In some complex reinforcement learning environments, even a small estimation error may have a large impact on the learned policy. In this paper, in order to solve the problem of overestimating the action value in deep Q-Network and the underestimation of the action value in double deep Q-Network, a new deep reinforcement learning framework is proposed-AWDDQN, which integrates the newly proposed weighted double estimator into double deep Q-Network. In order to reduce the estimation error of the target value, the average value of the previously learned action estimation values is calculated to generate a target value and the number of average action values is dynamically determined based on the temporal difference error. The experimental results show that AWDDQN can effectively reduce the bias and can enhance agent’s performance in some Atari 2600 games.
  • Related Articles

    [1]Zhuo Xinxin, Bai Xiaoying, Xu Jing, Li Enpeng, Liu Yu, Kang Jiehui, Song Wenli. A Tool for Automatic Service Interface Testing[J]. Journal of Computer Research and Development, 2018, 55(2): 358-376. DOI: 10.7544/issn1000-1239.2018.20160721
    [2]PanWeifeng, LiBing, ZhouXiaoyan, HePeng. Regression Test Case Prioritization Based on Bug Propagation Network[J]. Journal of Computer Research and Development, 2016, 53(3): 550-558. DOI: 10.7544/issn1000-1239.2016.20148329
    [3]You Feng, Zhao Ruilian, Lü Shanshan. Output Domain Based Automatic Test Case Generation[J]. Journal of Computer Research and Development, 2016, 53(3): 541-549. DOI: 10.7544/issn1000-1239.2016.20148045
    [4]Wang Kechao, Wang Tiantian, Su Xiaohong, Ma Peijun, Tong Zhixiang. Test Case Selection for Improving the Effectiveness of Software Fault Localization[J]. Journal of Computer Research and Development, 2014, 51(4): 865-873.
    [5]Chen Donghuo, Liu Quan. Generation of Test Cases Based on Symbolic Execution and LTL Formula Rewriting[J]. Journal of Computer Research and Development, 2013, 50(12): 2661-2675.
    [6]He Yanxiang, Chen Yong, Wu Wei, Xu Chao, and Wu Libing. Automatically Generating Error-Traceable Test Cases Based on Compiler[J]. Journal of Computer Research and Development, 2012, 49(9): 1843-1851.
    [7]Zhang Min, Feng Dengguo, and Chen Chi. A Security Function Test Suite Generation Method Based on Security Policy Model[J]. Journal of Computer Research and Development, 2009, 46(10): 1686-1692.
    [8]Tao Qiuming, Zhao Chen, Wang Yongji. An Automated Method of Test Program Generation for Compiler Optimizations Based on Process Graph[J]. Journal of Computer Research and Development, 2009, 46(9): 1567-1577.
    [9]Mao Chengying, Lu Yansheng. Strategies of Regression Test Case Selection for Component-Based Software[J]. Journal of Computer Research and Development, 2006, 43(10): 1767-1774.
    [10]Yuan Jiesong, Wang Linzhang, Li Xuandong, and Zheng Guoliang. UMLTGF: A Tool for Generating Test Cases from UML Activity Diagrams Based on Grey-Box Method[J]. Journal of Computer Research and Development, 2006, 43(1): 46-53.

Catalog

    Article views (982) PDF downloads (265) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return