Data-Source Adaptive Online Learning under Heteroscedastic Noise
Amith Bhat Hosadurga Anand ⋅ Aadirupa Saha ⋅ Haipeng Luo
Abstract
In this paper, we address the standard $K$-armed multi-armed bandit (MAB) with $M$ heterogeneous data sources, each exhibiting unknown and distinct noise variances, $\sigma_j^2$. We propose SOAR (Source-Optimistic Adaptive Regret Minimization), a novel algorithm that adaptively balances exploration and exploitation by jointly constructing upper confidence bounds for arm rewards and lower confidence bounds for data source variances. Our theoretical analysis establishes that SOAR achieves a regret bound of $\tilde{O}\left({\sigma^*}^2 \sum_{i=2}^K \tfrac{1}{\Delta_i}\right),$ along with a preprocessing cost that depends only on the problem parameters $\\{\sigma_j\\}_{j = 1}^M$, $K$, and grows at most logarithmically with the horizon $T$; where ${\sigma^\*}^2$ is the minimum source variance, and $\Delta_i$ denotes the suboptimality-gap of the $i$-th arm reward. The $\tilde O(.)$ notation hides the polylogarithmic factors in these problem parameters. This near-optimal instance dependence regret analysis of SOAR underscores its effectiveness in dynamically managing heteroscedastic noise without incurring significant overhead. Experiments on synthetic problem instances and a real dataset (MovieLens 25M) demonstrate that our method significantly outperforms baseline bandit algorithms in terms of regret performance. Our work opens a new direction for adaptively leveraging multiple heterogeneous data sources, extending beyond traditional bandit frameworks.
Successful Page Load