Abstract:
The linear bandits model is one of the most foundational online learning models, where a linear function parametrizes the mean payoff of each arm. The linear bandits model encompasses various applications with strong theoretical guarantees and practical modeling ability. However, existing algorithms suffer from the data irregularity that frequently emerges in real-world applications, as the data are usually collected from open and dynamic environments. In this paper, we are particularly concerned with two kinds of data irregularities: the underlying regression parameter could be changed with time, and the noise might not be bounded or even not sub-Gaussian, which are referred to as model drift and heavy-tailed noise, respectively. To deal with the two hostile factors, we propose a novel algorithm based on upper confidence bound. The median-of-means estimator is used to handle the potential heavy-tailed noise, and the restarting mechanism is employed to tackle the model drift. Theoretically, we establish the minimax lower bound to characterize the difficulty and prove that our algorithm enjoys a no-regret upper bound. The attained results subsume previous analysis for scenarios without either model drift or heavy-tailed noise. Empirically, we additionally design several online ensemble techniques to make our algorithm more adaptive to the environments. Extensive experiments are conducted on synthetic and real-world datasets to validate the effectiveness.