Timezone: »
In many online decision processes, the optimizing agent is called to choose between large numbers of alternatives with many inherent similarities; in turn, these similarities imply closely correlated losses that may confound standard discrete choice models and bandit algorithms. We study this question in the context of nested bandits, a class of adversarial multi-armed bandit problems where the learner seeks to minimize their regret in the presence of a large number of distinct alternatives with a hierarchy of embedded (non-combinatorial) similarities. In this setting, optimal algorithms based on the exponential weights blueprint (like Hedge, EXP3, and their variants) may incur significant regret because they tend to spend excessive amounts of time exploring irrelevant alternatives with similar, suboptimal costs. To account for this, we propose a nested exponential weights (NEW) algorithm that performs a layered exploration of the learner's set of alternatives based on a nested, step-by-step selection method. In so doing, we obtain a series of tight bounds for the learner's regret showing that online learning problems with a high degree of similarity between alternatives can be resolved efficiently, without a red bus / blue bus paradox occurring.
Author Information
Matthieu Martin (Criteo AI Lab)
Panayotis Mertikopoulos (CNRS and Criteo AI Lab)
Thibaud J Rahier (INRIA)
Houssam Zenati (Criteo, INRIA)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: Nested Bandits »
Thu. Jul 21st 03:35 -- 03:40 PM Room Room 307
More from the Same Authors
-
2022 : A Bias-Variance Analysis of Weight Averaging for OOD Generalization »
Alexandre Ramé · Matthieu Kirchmeyer · Thibaud J Rahier · Alain Rakotomamonjy · Patrick Gallinari · Matthieu Cord -
2023 Poster: Sequential Counterfactual Risk Minimization »
Houssam Zenati · Eustache Diemert · Matthieu Martin · Julien Mairal · Pierre Gaillard -
2023 Poster: Multi-Agent Online Optimization with Delays: Asynchronicity, Adaptivity, and Optimism »
Yu-Guan Hsieh · Franck Iutzeler · Jérôme Malick · Panayotis Mertikopoulos -
2022 Poster: UnderGrad: A Universal Black-Box Optimization Method with Almost Dimension-Free Convergence Rate Guarantees »
Kimon Antonakopoulos · Dong Quan Vu · Volkan Cevher · Kfir Levy · Panayotis Mertikopoulos -
2022 Oral: UnderGrad: A Universal Black-Box Optimization Method with Almost Dimension-Free Convergence Rate Guarantees »
Kimon Antonakopoulos · Dong Quan Vu · Volkan Cevher · Kfir Levy · Panayotis Mertikopoulos -
2022 Poster: AdaGrad Avoids Saddle Points »
Kimon Antonakopoulos · Panayotis Mertikopoulos · Georgios Piliouras · Xiao Wang -
2022 Spotlight: AdaGrad Avoids Saddle Points »
Kimon Antonakopoulos · Panayotis Mertikopoulos · Georgios Piliouras · Xiao Wang -
2021 Poster: The Limits of Min-Max Optimization Algorithms: Convergence to Spurious Non-Critical Sets »
Ya-Ping Hsieh · Panayotis Mertikopoulos · Volkan Cevher -
2021 Poster: Regret Minimization in Stochastic Non-Convex Learning via a Proximal-Gradient Approach »
Nadav Hallak · Panayotis Mertikopoulos · Volkan Cevher -
2021 Spotlight: Regret Minimization in Stochastic Non-Convex Learning via a Proximal-Gradient Approach »
Nadav Hallak · Panayotis Mertikopoulos · Volkan Cevher -
2021 Oral: The Limits of Min-Max Optimization Algorithms: Convergence to Spurious Non-Critical Sets »
Ya-Ping Hsieh · Panayotis Mertikopoulos · Volkan Cevher -
2021 Poster: Zeroth-Order Non-Convex Learning via Hierarchical Dual Averaging »
Amélie Héliou · Matthieu Martin · Panayotis Mertikopoulos · Thibaud J Rahier -
2021 Spotlight: Zeroth-Order Non-Convex Learning via Hierarchical Dual Averaging »
Amélie Héliou · Matthieu Martin · Panayotis Mertikopoulos · Thibaud J Rahier -
2020 Poster: Gradient-free Online Learning in Continuous Games with Delayed Rewards »
Amélie Héliou · Panayotis Mertikopoulos · Zhengyuan Zhou -
2020 Poster: A new regret analysis for Adam-type algorithms »
Ahmet Alacaoglu · Yura Malitsky · Panayotis Mertikopoulos · Volkan Cevher -
2020 Poster: Finite-Time Last-Iterate Convergence for Multi-Agent Learning in Games »
Tianyi Lin · Zhengyuan Zhou · Panayotis Mertikopoulos · Michael Jordan -
2019 Poster: Cautious Regret Minimization: Online Optimization with Long-Term Budget Constraints »
Nikolaos Liakopoulos · Apostolos Destounis · Georgios Paschos · Thrasyvoulos Spyropoulos · Panayotis Mertikopoulos -
2019 Oral: Cautious Regret Minimization: Online Optimization with Long-Term Budget Constraints »
Nikolaos Liakopoulos · Apostolos Destounis · Georgios Paschos · Thrasyvoulos Spyropoulos · Panayotis Mertikopoulos -
2018 Poster: Distributed Asynchronous Optimization with Unbounded Delays: How Slow Can You Go? »
Zhengyuan Zhou · Panayotis Mertikopoulos · Nicholas Bambos · Peter Glynn · Yinyu Ye · Li-Jia Li · Li Fei-Fei -
2018 Oral: Distributed Asynchronous Optimization with Unbounded Delays: How Slow Can You Go? »
Zhengyuan Zhou · Panayotis Mertikopoulos · Nicholas Bambos · Peter Glynn · Yinyu Ye · Li-Jia Li · Li Fei-Fei