Timezone: »
Spotlight
Adapting to Delays and Data in Adversarial MultiArmed Bandits
András György · Pooria Joulani
We consider the adversarial multiarmed bandit problem under delayed feedback. We analyze variants of the Exp3 algorithm that tune their step size using only information (about the losses and delays) available at the time of the decisions, and obtain regret guarantees that adapt to the observed (rather than the worstcase) sequences of delays and/or losses. First, through a remarkably simple proof technique, we show that with proper tuning of the step size, the algorithm achieves an optimal (up to logarithmic factors) regret of order $\sqrt{\log(K)(TK + D)}$ both in expectation and in high probability, where $K$ is the number of arms, $T$ is the time horizon, and $D$ is the cumulative delay. The highprobability version of the bound, which is the first highprobability delayadaptive bound in the literature, crucially depends on the use of implicit exploration in estimating the losses. Then, following Zimmert and Seldin (2019), we extend these results so that the algorithm can ``skip'' rounds with large delays, resulting in regret bounds of order $\sqrt{TK\log(K)} + R + \sqrt{D_{\bar{R}}\log(K)}$, where $R$ is an arbitrary set of rounds (which are skipped) and $D_{\bar{R}}$ is the cumulative delay of the feedback for other rounds. Finally, we present another, dataadaptive (AdaGradstyle) version of the algorithm for which the regret adapts to the observed (delayed) losses instead of only adapting to the cumulative delay (this algorithm requires an a priori upper bound on the maximum delay, or the advance knowledge of the delay for each decision when it is made). The resulting bound can be orders of magnitude smaller on benign problems, and it can be shown that the delay only affects the regret through the loss of the best arm.
Author Information
András György (DeepMind)
Pooria Joulani (DeepMind)
Related Events (a corresponding poster, oral, or spotlight)

2021 Poster: Adapting to Delays and Data in Adversarial MultiArmed Bandits »
Thu. Jul 22nd 04:00  06:00 PM Room Virtual
More from the Same Authors

2022 : Nonstationary Bandits and MetaLearning with a Small Set of Optimal Arms »
MohammadJavad Azizi · Thang Duong · Yasin AbbasiYadkori · Claire Vernade · András György · Mohammad Ghavamzadeh 
2022 : Improved Generalization Bounds for Transfer Learning via Neural Collapse »
Tomer Galanti · András György · Marcus Hutter 
2020 Poster: NonStationary Delayed Bandits with Intermediate Observations »
Claire Vernade · András György · Timothy Mann 
2020 Poster: A simpler approach to accelerated optimization: iterative averaging meets optimism »
Pooria Joulani · Anant Raj · András György · Csaba Szepesvari 
2019 Poster: Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems »
Timothy Mann · Sven Gowal · András György · Huiyi Hu · Ray Jiang · Balaji Lakshminarayanan · Prav Srinivasan 
2019 Oral: Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems »
Timothy Mann · Sven Gowal · András György · Huiyi Hu · Ray Jiang · Balaji Lakshminarayanan · Prav Srinivasan 
2019 Poster: CapsAndRuns: An Improved Method for Approximately Optimal Algorithm Configuration »
Gellért Weisz · András György · Csaba Szepesvari 
2019 Oral: CapsAndRuns: An Improved Method for Approximately Optimal Algorithm Configuration »
Gellért Weisz · András György · Csaba Szepesvari 
2018 Poster: LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration »
Gellért Weisz · András György · Csaba Szepesvari 
2018 Oral: LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration »
Gellért Weisz · András György · Csaba Szepesvari