Skip to yearly menu bar Skip to main content


Poster & Spotlight Talk
in
Workshop: Workshop on Reinforcement Learning Theory

Solving Multi-Arm Bandit Using a Few Bits of Communication

Osama Hanna · Lin Yang · Christina Fragouli


Abstract: The multi-armed bandit (MAB) problem is an active learning framework that aims to select the best among a set of actions by sequentially observing rewards. Recently, it has become popular for a number of applications over wireless networks, where communication constraints can form a bottleneck. Yet existing works usually fail to address this issue and can become infeasible in certain applications. In this paper, we propose QuBan, a generic reward quantization algorithm that applies to any (no-regret) multi-armed bandit algorithm. The modified algorithm requires on average a few (as low as $3$) bits to be sent per iteration, yet preserving the same regret as the original algorithm. Our upper bounds apply under mild assumptions on the reward distributions over all current (and future) MAB algorithms, including those used in contextual bandits. We also numerically evaluate the application of QuBan to widely used algorithms such as UCB and $\epsilon$-greedy.