Timezone: »
We introduce and analyze a best arm identification problem in the rested bandit setting, wherein arms are themselves learning algorithms whose expected losses decrease with the number of times the arm has been played. The shape of the expected loss functions is similar across arms, and is assumed to be available up to unknown parameters that have to be learned on the fly. We define a novel notion of regret for this problem, where we compare to the policy that always plays the arm having the smallest expected loss at the end of the game. We analyze an arm elimination algorithm whose regret vanishes as the time horizon increases. The actual rate of convergence depends in a detailed way on the postulated functional form of the expected losses. We complement our analysis with lower bounds, indicating strengths and limitations of the proposed solution.
Author Information
Leonardo Cella (Italian Institute of Technology)
Massimiliano Pontil (Istituto Italiano di Tecnologia and University College London)
-
Claudio Gentile (Google Research)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Best Model Identification: A Rested Bandit Formulation »
Wed. Jul 21st 04:00 -- 06:00 PM Room Virtual
More from the Same Authors
-
2022 Poster: Distribution Regression with Sliced Wasserstein Kernels »
Dimitri Marie Meunier · Massimiliano Pontil · Carlo Ciliberto -
2022 Spotlight: Distribution Regression with Sliced Wasserstein Kernels »
Dimitri Marie Meunier · Massimiliano Pontil · Carlo Ciliberto -
2022 Poster: Achieving Minimax Rates in Pool-Based Batch Active Learning »
Claudio Gentile · Zhilei Wang · Tong Zhang -
2022 Spotlight: Achieving Minimax Rates in Pool-Based Batch Active Learning »
Claudio Gentile · Zhilei Wang · Tong Zhang -
2021 Poster: Hierarchical Clustering of Data Streams: Scalable Algorithms and Approximation Guarantees »
Anand Rajagopalan · Fabio Vitale · Danny Vainstein · Gui Citovsky · Cecilia Procopiuc · Claudio Gentile -
2021 Spotlight: Hierarchical Clustering of Data Streams: Scalable Algorithms and Approximation Guarantees »
Anand Rajagopalan · Fabio Vitale · Danny Vainstein · Gui Citovsky · Cecilia Procopiuc · Claudio Gentile -
2021 Poster: Dynamic Balancing for Model Selection in Bandits and RL »
Ashok Cutkosky · Christoph Dann · Abhimanyu Das · Claudio Gentile · Aldo Pacchiano · Manish Purohit -
2021 Spotlight: Dynamic Balancing for Model Selection in Bandits and RL »
Ashok Cutkosky · Christoph Dann · Abhimanyu Das · Claudio Gentile · Aldo Pacchiano · Manish Purohit -
2020 Poster: Adaptive Region-Based Active Learning »
Corinna Cortes · Giulia DeSalvo · Claudio Gentile · Mehryar Mohri · Ningshan Zhang -
2020 Poster: Online Learning with Dependent Stochastic Feedback Graphs »
Corinna Cortes · Giulia DeSalvo · Claudio Gentile · Mehryar Mohri · Ningshan Zhang -
2020 Poster: On the Iteration Complexity of Hypergradient Computation »
Riccardo Grazzi · Luca Franceschi · Massimiliano Pontil · Saverio Salzo -
2020 Poster: Meta-learning with Stochastic Linear Bandits »
Leonardo Cella · Alessandro Lazaric · Massimiliano Pontil -
2019 Poster: Online Learning with Sleeping Experts and Feedback Graphs »
Corinna Cortes · Giulia DeSalvo · Claudio Gentile · Mehryar Mohri · Scott Yang -
2019 Oral: Online Learning with Sleeping Experts and Feedback Graphs »
Corinna Cortes · Giulia DeSalvo · Claudio Gentile · Mehryar Mohri · Scott Yang -
2019 Poster: Active Learning with Disagreement Graphs »
Corinna Cortes · Giulia DeSalvo · Mehryar Mohri · Ningshan Zhang · Claudio Gentile -
2019 Poster: Learning Discrete Structures for Graph Neural Networks »
Luca Franceschi · Mathias Niepert · Massimiliano Pontil · Xiao He -
2019 Oral: Learning Discrete Structures for Graph Neural Networks »
Luca Franceschi · Mathias Niepert · Massimiliano Pontil · Xiao He -
2019 Oral: Active Learning with Disagreement Graphs »
Corinna Cortes · Giulia DeSalvo · Mehryar Mohri · Ningshan Zhang · Claudio Gentile -
2019 Poster: Leveraging Low-Rank Relations Between Surrogate Tasks in Structured Prediction »
Giulia Luise · Dimitrios Stamos · Massimiliano Pontil · Carlo Ciliberto -
2019 Oral: Leveraging Low-Rank Relations Between Surrogate Tasks in Structured Prediction »
Giulia Luise · Dimitrios Stamos · Massimiliano Pontil · Carlo Ciliberto -
2018 Poster: Online Learning with Abstention »
Corinna Cortes · Giulia DeSalvo · Claudio Gentile · Mehryar Mohri · Scott Yang -
2018 Oral: Online Learning with Abstention »
Corinna Cortes · Giulia DeSalvo · Claudio Gentile · Mehryar Mohri · Scott Yang