Heavy-Tailed Linear Bandits: Huber Regression with One-Pass Update
Jing Wang · Yu-Jie Zhang · Peng Zhao · Zhi-Hua Zhou
Abstract
We study the stochastic linear bandits with heavy-tailed noise. Two principled strategies for handling heavy-tailed noise, truncation and median-of-means, have been introduced to heavy-tailed bandits. Nonetheless, these methods rely on specific noise assumptions or bandit structures, limiting their applicability to general settings. The recent work [Huang et al.2024] develop a soft truncation method via the adaptive Huber regression to address these limitations. However, their method suffers undesired computational cost: it requires storing all historical data and performing a full pass over these data at each round. In this paper, we propose a \emph{one-pass} algorithm based on the online mirror descent framework. Our method updates using only current data at each round, reducing the per-round computational cost from $\mathcal{O}(t \log T)$ to $\mathcal{O}(1)$ with respect to current round $t$ and the time horizon $T$, and achieves a near-optimal and variance-aware regret of order $\widetilde{\mathcal{O}}\big(d T^{\frac{1-\varepsilon}{2(1+\varepsilon)}} \sqrt{\sum_{t=1}^T \nu_t^2} + d T^{\frac{1-\varepsilon}{2(1+\varepsilon)}}\big)$ where $d$ is the dimension and $\nu_t^{1+\varepsilon}$ is the $(1+\varepsilon)$-th central moment of reward at round $t$.
Lay Summary
In online decision-making with streaming data, such as in financial markets and online advertising, observations are often affected by heavy-tailed noise, making learning and decision-making particularly challenging. Existing methods rely on storing and processing all past data at each step to achieve strong performance, which is time-consuming and inefficient. This paper introduces a new algorithm that makes decisions using only the current observation, drastically reducing computational cost while maintaining high performance—both theoretically proven and empirically validated.
Video
Chat is not available.
Successful Page Load