A Deep Q-Network Method Based on Upper Confidence Bound Experience Sampling
-
Graphical Abstract
-
Abstract
Recently, deep reinforcement learning (DRL), which combines deep learning (DL) with reinforcement learning (RL) together, has become a hot topic in the field of artificial intelligence. Deep reinforcement learning has made a great breakthrough in the task of optimal policy solving with high dimensional inputs. To remove the temporary correlation among the observed transitions, deep Q-network uses a sampling mechanism called experience replay that replays transitions at random from the memory buffer, which breaks the relationship among samples. However, random sampling doesn’t consider the priority of sample’s transition in the memory buffer. As a result, it is likely to sample data with insignificant information excessively while ignoring informative samples during the process of network training, which leads to longer training time as well as unsatisfactory training effect. To solve this problem, we introduce the idea of priority to traditional deep Q-network and put forward a prioritized sampling algorithm based on upper confidence bound (UCB). It determines sample’s probability of being selected in memory buffer by reward, time step, and sampling times. The proposed approach assigns samples that haven’t been chosen, samples that are more valuable, and samples that have good results, with higher probability of being selected, which guarantees the diversity of samples, such that the agent is able to select action more effectively. Finally, simulation experiments of Atari 2600 games verify the approach.
-
-