Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions

Tal Lancewicki · Shahar Segal · Tomer Koren · Yishay Mansour

Keywords: [ Bandits ] [ Sparsity and Compressed Sensing ] [ Algorithms ] [ Reinforcement Learning and Planning ] [ Algorithms -> Sparse Coding and Dimensionality Expansion; Applications -> Denoising; Applications ] [ Signal Processing; Deep Le ]

[ Abstract ]
[ Paper ]
[ Visit Poster at Spot A1 in Virtual World ]
Wed 21 Jul 9 a.m. PDT — 11 a.m. PDT
Spotlight presentation: Bandits 1
Wed 21 Jul 7 a.m. PDT — 8 a.m. PDT


We study the stochastic Multi-Armed Bandit~(MAB) problem with random delays in the feedback received by the algorithm. We consider two settings: the {\it reward dependent} delay setting, where realized delays may depend on the stochastic rewards, and the {\it reward-independent} delay setting. Our main contribution is algorithms that achieve near-optimal regret in each of the settings, with an additional additive dependence on the quantiles of the delay distribution. Our results do not make any assumptions on the delay distributions: in particular, we do not assume they come from any parametric family of distributions and allow for unbounded support and expectation; we further allow for the case of infinite delays where the algorithm might occasionally not observe any feedback.

Chat is not available.