稳健的重尾线性赌博机算法

马兰霁弘; 赵鹏; 周志华

doi:10.7544/issn1000-1239.202220279

摘要: 线性赌博机模型是在线学习的基本模型之一，其每个摇臂的平均奖赏可以由线性函数进行参数化. 该模型具有坚实的理论保证和良好的实际建模能力，被广泛应用于各个场景. 然而在一些现实场景中，数据通常是从开放动态环境中收集得到，因而会存在数据不规范的问题，已有算法缺乏对此的稳健性. 特别关注2类数据的不规范性：奖励函数的回归参数可能随时间变化，环境噪声可能无界，甚至不服从亚高斯分布. 这2类问题分别被称为分布变化和重尾噪声. 为了应对这2类不利因素，提出一种基于置信上界的在线算法，该算法使用均值中位数估计器以处理潜在的重尾噪声，同时采用重启机制来解决分布变化问题. 在理论上，首先建立了问题的遗憾理论下界，进一步给出了算法的理论保障，所取得的结果可以回退到已有研究中没有分布变化或没有重尾噪声场景线性赌博机的理论结果. 此外，针对未知环境设计了实用的在线集成适应技术，并在合成和真实世界的数据集上进行了广泛的实验来验证其有效性.

Abstract: The linear bandits model is one of the most foundational online learning models, where a linear function parametrizes the mean payoff of each arm. The linear bandits model encompasses various applications with strong theoretical guarantees and practical modeling ability. However, existing algorithms suffer from the data irregularity that frequently emerges in real-world applications, as the data are usually collected from open and dynamic environments. In this paper, we are particularly concerned with two kinds of data irregularities: the underlying regression parameter could be changed with time, and the noise might not be bounded or even not sub-Gaussian, which are referred to as model drift and heavy-tailed noise, respectively. To deal with the two hostile factors, we propose a novel algorithm based on upper confidence bound. The median-of-means estimator is used to handle the potential heavy-tailed noise, and the restarting mechanism is employed to tackle the model drift. Theoretically, we establish the minimax lower bound to characterize the difficulty and prove that our algorithm enjoys a no-regret upper bound. The attained results subsume previous analysis for scenarios without either model drift or heavy-tailed noise. Empirically, we additionally design several online ensemble techniques to make our algorithm more adaptive to the environments. Extensive experiments are conducted on synthetic and real-world datasets to validate the effectiveness.

稳健的重尾线性赌博机算法

Robust Heavy-Tailed Linear Bandits Algorithm