A Deep Q-Network Method Based on Upper Confidence Bound Experience Sampling

Zhu Fei; Wu Wen; Liu Quan; Fu Yuchen

doi:10.7544/issn1000-1239.2018.20180148

Zhu Fei, Wu Wen, Liu Quan, Fu Yuchen. A Deep Q-Network Method Based on Upper Confidence Bound Experience SamplingJ. Journal of Computer Research and Development, 2018, 55(8): 1694-1705. DOI: 10.7544/issn1000-1239.2018.20180148

Citation:

A Deep Q-Network Method Based on Upper Confidence Bound Experience Sampling

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Recently, deep reinforcement learning (DRL), which combines deep learning (DL) with reinforcement learning (RL) together, has become a hot topic in the field of artificial intelligence. Deep reinforcement learning has made a great breakthrough in the task of optimal policy solving with high dimensional inputs. To remove the temporary correlation among the observed transitions, deep Q-network uses a sampling mechanism called experience replay that replays transitions at random from the memory buffer, which breaks the relationship among samples. However, random sampling doesn’t consider the priority of sample’s transition in the memory buffer. As a result, it is likely to sample data with insignificant information excessively while ignoring informative samples during the process of network training, which leads to longer training time as well as unsatisfactory training effect. To solve this problem, we introduce the idea of priority to traditional deep Q-network and put forward a prioritized sampling algorithm based on upper confidence bound (UCB). It determines sample’s probability of being selected in memory buffer by reward, time step, and sampling times. The proposed approach assigns samples that haven’t been chosen, samples that are more valuable, and samples that have good results, with higher probability of being selected, which guarantees the diversity of samples, such that the agent is able to select action more effectively. Finally, simulation experiments of Atari 2600 games verify the approach.

FullText(HTML)

References (0)

Cited By

Turn off MathJax

Article Contents

A Deep Q-Network Method Based on Upper Confidence Bound Experience Sampling

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content