• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Zhang Xiaofang, Zhou Qian, Liang Bin, Xu Jin. An Adaptive Algorithm in Multi-Armed Bandit Problem[J]. Journal of Computer Research and Development, 2019, 56(3): 643-654. DOI: 10.7544/issn1000-1239.2019.20180019
Citation: Zhang Xiaofang, Zhou Qian, Liang Bin, Xu Jin. An Adaptive Algorithm in Multi-Armed Bandit Problem[J]. Journal of Computer Research and Development, 2019, 56(3): 643-654. DOI: 10.7544/issn1000-1239.2019.20180019

An Adaptive Algorithm in Multi-Armed Bandit Problem

More Information
  • Published Date: February 28, 2019
  • As an important ongoing field in machine learning, reinforcement learning has received extensive attention in recent years. The multi-armed bandit (MAB) problem is a typical problem of the exploration and exploitation dilemma in reinforcement learning. As a classical MAB problem, the stochastic multi-armed bandit (SMAB) problem is the base of many new MAB problems. To solve the problems of insufficient use of information and poor generalization ability in existing MAB methods, this paper presents an adaptive SMAB algorithm to balance exploration and exploitation based on the chosen number of arm with minimal estimation, namely CNAME in short. CNAME makes use of the chosen times and the estimations of an action at the same time, so that an action is chosen according to the exploration probability, which is updated adaptively. In order to control the decline rate of exploration probability, the parameter w is introduced to adjust the influence degree of feedback during the selection process. Furthermore, CNAME does not depend on contextual information, hence it has better generalization ability. The upper bound of CNAMEs regret is theoretically proved and analyzed. Our experimental results in different scenarios show that CNAME can yield greater reward and smaller regret with high efficiency than commonly used methods. In addition, its generalization ability is very strong.
  • Cited by

    Periodical cited type(9)

    1. 亢中苗,吴赞红,张珮明,黄东海,包宇奔,卢文冰,张孙烜. 基于SDN弹性光网络的电力通信网智能业务编排方法. 哈尔滨理工大学学报. 2024(03): 99-106 .
    2. 杨会峰,魏勇,尚立,刘玮,李建岐,张孙烜. 支撑配电网监测的无线传感网自适应中继选择. 哈尔滨理工大学学报. 2023(03): 88-97 .
    3. 曾俊杰,秦龙,徐浩添,张琪,胡越,尹全军. 基于内在动机的深度强化学习探索方法综述. 计算机研究与发展. 2023(10): 2359-2382 . 本站查看
    4. 敖天宇,刘全. 一种快速收敛的最大置信上界探索方法. 计算机科学. 2022(01): 298-305 .
    5. 李毅超,纪春华,尚立,魏勇,李建岐. 面向地下电力管廊监测的无线自组网中继覆盖增强技术. 电力信息与通信技术. 2022(06): 82-88 .
    6. 何羽丰,徐建民,张彬. 融合用户聚类与Bandits算法的微博推荐模型. 小型微型计算机系统. 2022(10): 2122-2130 .
    7. 林宝玲,贾日恒,林飞龙,郑忠龙,李明禄. 基于预算时变的多臂赌博机模型. 计算机科学. 2022(S2): 175-180 .
    8. 吴秀丽,张雅琦. 学习型混合差分进化算法优化月台调度问题. 计算机集成制造系统. 2022(11): 3464-3478 .
    9. 周敏,王少尉. 频谱感知次序的在线最优选择. 国防科技大学学报. 2020(04): 24-29 .

    Other cited types(17)

Catalog

    Article views (1227) PDF downloads (359) Cited by(26)
    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return