Timezone: »
We study reward maximisation in a wide class of structured stochastic multi-armed bandit problems, where the mean rewards of arms satisfy some given structural constraints, e.g. linear, unimodal, sparse, etc. Our aim is to develop methods that are \emph{flexible} (in that they easily adapt to different structures), \emph{powerful} (in that they perform well empirically and/or provably match instance-dependent lower bounds) and \emph{efficient} in that the per-round computational burden is small. We develop asymptotically optimal algorithms from instance-dependent lower-bounds using iterative saddle-point solvers. Our approach generalises recent iterative methods for pure exploration to reward maximisation, where a major challenge arises from the estimation of the sub-optimality gaps and their reciprocals. Still we manage to achieve all the above desiderata. Notably, our technique avoids the computational cost of the full-blown saddle point oracle employed by previous work, while at the same time enabling finite-time regret bounds. Our experiments reveal that our method successfully leverages the structural assumptions, while its regret is at worst comparable to that of vanilla UCB.
Author Information
Rémy Degenne (Inria Paris)
Han Shao (Toyota Technological Institute at Chicago)
Wouter Koolen (Centrum Wiskunde & Informatica, Amsterdam)
More from the Same Authors
-
2022 : Choosing Answers in Epsilon-Best-Answer Identification for Linear Bandits »
Marc Jourdan · Rémy Degenne -
2022 Workshop: Complex feedback in online learning »
Rémy Degenne · Pierre Gaillard · Wouter Koolen · Aadirupa Saha -
2022 Poster: Choosing Answers in Epsilon-Best-Answer Identification for Linear Bandits »
Marc Jourdan · Rémy Degenne -
2022 Spotlight: Choosing Answers in Epsilon-Best-Answer Identification for Linear Bandits »
Marc Jourdan · Rémy Degenne -
2021 Poster: One for One, or All for All: Equilibria and Optimality of Collaboration in Federated Learning »
Avrim Blum · Nika Haghtalab · Richard Lanas Phillips · Han Shao -
2021 Spotlight: One for One, or All for All: Equilibria and Optimality of Collaboration in Federated Learning »
Avrim Blum · Nika Haghtalab · Richard Lanas Phillips · Han Shao -
2020 Poster: Gamification of Pure Exploration for Linear Bandits »
Rémy Degenne · Pierre Menard · Xuedong Shang · Michal Valko