• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Bai Chenjia, Liu Peng, Zhao Wei, Tang Xianglong. Active Sampling for Deep Q-Learning Based on TD-error Adaptive Correction[J]. Journal of Computer Research and Development, 2019, 56(2): 262-280. DOI: 10.7544/issn1000-1239.2019.20170812
Citation: Bai Chenjia, Liu Peng, Zhao Wei, Tang Xianglong. Active Sampling for Deep Q-Learning Based on TD-error Adaptive Correction[J]. Journal of Computer Research and Development, 2019, 56(2): 262-280. DOI: 10.7544/issn1000-1239.2019.20170812

Active Sampling for Deep Q-Learning Based on TD-error Adaptive Correction

More Information
  • Published Date: January 31, 2019
  • Deep reinforcement learning (DRL) is one of research hotspots in artificial intelligence. Deep Q-learning is one of the representative achievements of DRL. In some fields, its performance has met or exceeded the level of human expert. It is necessary for training deep Q-learning to acquire lots of samples. These samples are obtained by the interaction between agent and environment. However, it is usually computationally intensive and sometimes impossible to keep away from interaction risk. We propose an active sampling method based on TD-error adaptive correction in order to solve sample efficiency problem in deep Q-learning. In various deep Q-learning methods, the updating of storage priority in experience memory lags behind the updating of Q-network parameters. It causes that a lot of samples are not selected to apply in Q-network training because the storage priority cannot reflect the true distribution of TD-error in experience memory. The TD-error adaptive correction active sampling method proposed in this paper uses the replay periods of samples and Q-network state to establish a priority bias model to estimate the real priority of each sample in experience memory during the Q-network iteration. The samples are selected from experience memory according to the corrected priority and the bias model parameters are adaptively updated by a segmented form. We analyze the complexity of the algorithm and the relationship between learning performance and the order of polynomial feature and updating period of model parameters. Our method is verified on the platform of Atari 2600. The experimental results show that proposed method improves the learning speed and reduces the number of interaction between agent and environment. Meanwhile, it ameliorates the quality of optimal policy.
  • Related Articles

    [1]Zhang Jing, Wang Ziming, Ren Yonggong. A3C Deep Reinforcement Learning Model Compression and Knowledge Extraction[J]. Journal of Computer Research and Development, 2023, 60(6): 1373-1384. DOI: 10.7544/issn1000-1239.202111186
    [2]Ma Ang, Yu Yanhua, Yang Shengli, Shi Chuan, Li Jie, Cai Xiuxiu. Survey of Knowledge Graph Based on Reinforcement Learning[J]. Journal of Computer Research and Development, 2022, 59(8): 1694-1722. DOI: 10.7544/issn1000-1239.20211264
    [3]Yu Xian, Li Zhenyu, Sun Sheng, Zhang Guangxing, Diao Zulong, Xie Gaogang. Adaptive Virtual Machine Consolidation Method Based on Deep Reinforcement Learning[J]. Journal of Computer Research and Development, 2021, 58(12): 2783-2797. DOI: 10.7544/issn1000-1239.2021.20200366
    [4]Qi Faxin, Tong Xiangrong, Yu Lei. Agent Trust Boost via Reinforcement Learning DQN[J]. Journal of Computer Research and Development, 2020, 57(6): 1227-1238. DOI: 10.7544/issn1000-1239.2020.20190403
    [5]Fan Hao, Xu Guangping, Xue Yanbing, Gao Zan, Zhang Hua. An Energy Consumption Optimization and Evaluation for Hybrid Cache Based on Reinforcement Learning[J]. Journal of Computer Research and Development, 2020, 57(6): 1125-1139. DOI: 10.7544/issn1000-1239.2020.20200010
    [6]Zhang Wentao, Wang Lu, Cheng Yaodong. Performance Optimization of Lustre File System Based on Reinforcement Learning[J]. Journal of Computer Research and Development, 2019, 56(7): 1578-1586. DOI: 10.7544/issn1000-1239.2019.20180797
    [7]Zhang Kaifeng, Yu Yang. Methodologies for Imitation Learning via Inverse Reinforcement Learning: A Review[J]. Journal of Computer Research and Development, 2019, 56(2): 254-261. DOI: 10.7544/issn1000-1239.2019.20170578
    [8]Zhao Fengfei and Qin Zheng. A Multi-Motive Reinforcement Learning Framework[J]. Journal of Computer Research and Development, 2013, 50(2): 240-247.
    [9]Lin Fen, Shi Chuan, Luo Jiewen, Shi Zhongzhi. Dual Reinforcement Learning Based on Bias Learning[J]. Journal of Computer Research and Development, 2008, 45(9): 1455-1462.
    [10]Shi Chuan, Shi Zhongzhi, Wang Maoguang. Online Hierarchical Reinforcement Learning Based on Path-matching[J]. Journal of Computer Research and Development, 2008, 45(9).
  • Cited by

    Periodical cited type(3)

    1. 潘佳,于秀兰. 基于社交意识和支付激励的D2D协作传输策略. 计算机应用研究. 2023(06): 1801-1805 .
    2. 刘琳岚,谭镇阳,舒坚. 基于图神经网络的机会网络节点重要度评估方法. 计算机研究与发展. 2022(04): 834-851 . 本站查看
    3. 王淳,吴仕荣. 舰船自组织网络数据分发机制研究. 舰船科学技术. 2020(14): 166-168 .

    Other cited types(2)

Catalog

    Article views (2105) PDF downloads (621) Cited by(5)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return