Timezone: »

Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions
Tal Lancewicki · Shahar Segal · Tomer Koren · Yishay Mansour

Wed Jul 21 09:00 AM -- 11:00 AM (PDT) @ None #None

We study the stochastic Multi-Armed Bandit~(MAB) problem with random delays in the feedback received by the algorithm. We consider two settings: the {\it reward dependent} delay setting, where realized delays may depend on the stochastic rewards, and the {\it reward-independent} delay setting. Our main contribution is algorithms that achieve near-optimal regret in each of the settings, with an additional additive dependence on the quantiles of the delay distribution. Our results do not make any assumptions on the delay distributions: in particular, we do not assume they come from any parametric family of distributions and allow for unbounded support and expectation; we further allow for the case of infinite delays where the algorithm might occasionally not observe any feedback.

Author Information

Tal Lancewicki (Tel-Aviv University)
Shahar Segal (Tel Aviv University)
Tomer Koren (Tel Aviv University and Google)
Yishay Mansour (Google and Tel Aviv University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors