Timezone: »
We propose a framework for model selection by combining base algorithms in stochastic bandits and reinforcement learning. We require a candidate regret bound for each base algorithm that may or may not hold. We select base algorithms to play in each round using a ``balancing condition'' on the candidate regret bounds. Our approach simultaneously recovers previous worst-case regret bounds, while also obtaining much smaller regret in natural scenarios when some base learners significantly exceed their candidate bounds. Our framework is relevant in many settings, including linear bandits and MDPs with nested function classes, linear bandits with unknown misspecification, and tuning confidence parameters of algorithms such as LinUCB. Moreover, unlike recent efforts in model selection for linear stochastic bandits, our approach can be extended to consider adversarial rather than stochastic contexts.
Author Information
Ashok Cutkosky (Boston University)
Christoph Dann (Google)
Abhimanyu Das (Google)
Claudio Gentile (Google Research)
Aldo Pacchiano (UC Berkeley)
Manish Purohit (Google Research)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: Dynamic Balancing for Model Selection in Bandits and RL »
Thu. Jul 22nd 12:45 -- 12:50 AM Room
More from the Same Authors
-
2021 : Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity »
Dhruv Malik · Aldo Pacchiano · Vishwak Srinivasan · Yuanzhi Li -
2021 : Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection »
Matteo Papini · Andrea Tirinzoni · Aldo Pacchiano · Marcello Restelli · Alessandro Lazaric · Matteo Pirotta -
2021 : Estimating Optimal Policy Value in Linear Contextual Bandits beyond Gaussianity »
Jonathan Lee · Weihao Kong · Aldo Pacchiano · Vidya Muthukumar · Emma Brunskill -
2021 : Meta Learning MDPs with linear transition models »
Robert Müller · Aldo Pacchiano · Jack Parker-Holder -
2021 : On the Theory of Reinforcement Learning with Once-per-Episode Feedback »
Niladri Chatterji · Aldo Pacchiano · Peter Bartlett · Michael Jordan -
2022 : Optimal Parameter-free Online Learning with Switching Cost »
Zhiyu Zhang · Ashok Cutkosky · Ioannis Paschalidis -
2023 : Experiment Planning with Function Approximation »
Aldo Pacchiano · Jonathan Lee · Emma Brunskill -
2023 : Anytime Model Selection in Linear Bandits »
Parnian Kassraie · Aldo Pacchiano · Nicolas Emmenegger · Andreas Krause -
2023 : Undo Maps: A Tool for Adapting Policies to Perceptual Distortions »
Abhi Gupta · Ted Moskovitz · David Alvarez-Melis · Aldo Pacchiano -
2023 : In-Context Decision-Making from Supervised Pretraining »
Jonathan Lee · Annie Xie · Aldo Pacchiano · Yash Chandak · Chelsea Finn · Ofir Nachum · Emma Brunskill -
2023 : Experiment Planning with Function Approximation »
Aldo Pacchiano · Jonathan Lee · Emma Brunskill -
2023 : Anytime Model Selection in Linear Bandits »
Parnian Kassraie · Aldo Pacchiano · Nicolas Emmenegger · Andreas Krause -
2023 Poster: Learning in POMDPs is Sample-Efficient with Hindsight Observability »
Jonathan Lee · Alekh Agarwal · Christoph Dann · Tong Zhang -
2023 Poster: Leveraging Offline Data in Online Reinforcement Learning »
Andrew Wagenmaker · Aldo Pacchiano -
2023 Poster: Reinforcement Learning Can Be More Efficient with Multiple Rewards »
Christoph Dann · Yishay Mansour · Mehryar Mohri -
2023 Poster: Best of Both Worlds Policy Optimization »
Christoph Dann · Chen-Yu Wei · Julian Zimmert -
2023 Poster: Optimal Stochastic Non-smooth Non-convex Optimization through Online-to-Non-convex Conversion »
Ashok Cutkosky · Harsh Mehta · Francesco Orabona -
2023 Oral: Best of Both Worlds Policy Optimization »
Christoph Dann · Chen-Yu Wei · Julian Zimmert -
2023 Poster: Unconstrained Online Learning with Unbounded Losses »
Andrew Jacobsen · Ashok Cutkosky -
2023 Poster: Bandit Online Linear Optimization with Hints and Queries »
Aditya Bhaskara · Ashok Cutkosky · Ravi Kumar · Manish Purohit -
2023 Poster: Efficient List-Decodable Regression using Batches »
Abhimanyu Das · Ayush Jain · Weihao Kong · Rajat Sen -
2022 Poster: Parsimonious Learning-Augmented Caching »
Sungjin Im · Ravi Kumar · Aditya Petety · Manish Purohit -
2022 Spotlight: Parsimonious Learning-Augmented Caching »
Sungjin Im · Ravi Kumar · Aditya Petety · Manish Purohit -
2022 Poster: PDE-Based Optimal Strategy for Unconstrained Online Learning »
Zhiyu Zhang · Ashok Cutkosky · Ioannis Paschalidis -
2022 Poster: Achieving Minimax Rates in Pool-Based Batch Active Learning »
Claudio Gentile · Zhilei Wang · Tong Zhang -
2022 Spotlight: PDE-Based Optimal Strategy for Unconstrained Online Learning »
Zhiyu Zhang · Ashok Cutkosky · Ioannis Paschalidis -
2022 Spotlight: Achieving Minimax Rates in Pool-Based Batch Active Learning »
Claudio Gentile · Zhilei Wang · Tong Zhang -
2022 Poster: Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback »
Tianyi Lin · Aldo Pacchiano · Yaodong Yu · Michael Jordan -
2022 Spotlight: Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback »
Tianyi Lin · Aldo Pacchiano · Yaodong Yu · Michael Jordan -
2021 : On the Theory of Reinforcement Learning with Once-per-Episode Feedback »
Niladri Chatterji · Aldo Pacchiano · Peter Bartlett · Michael Jordan -
2021 Poster: Hierarchical Clustering of Data Streams: Scalable Algorithms and Approximation Guarantees »
Anand Rajagopalan · Fabio Vitale · Danny Vainstein · Gui Citovsky · Cecilia Procopiuc · Claudio Gentile -
2021 Spotlight: Hierarchical Clustering of Data Streams: Scalable Algorithms and Approximation Guarantees »
Anand Rajagopalan · Fabio Vitale · Danny Vainstein · Gui Citovsky · Cecilia Procopiuc · Claudio Gentile -
2021 Poster: Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity »
Dhruv Malik · Aldo Pacchiano · Vishwak Srinivasan · Yuanzhi Li -
2021 Poster: Robust Pure Exploration in Linear Bandits with Limited Budget »
Ayya Alieva · Ashok Cutkosky · Abhimanyu Das -
2021 Spotlight: Robust Pure Exploration in Linear Bandits with Limited Budget »
Ayya Alieva · Ashok Cutkosky · Abhimanyu Das -
2021 Spotlight: Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity »
Dhruv Malik · Aldo Pacchiano · Vishwak Srinivasan · Yuanzhi Li -
2021 Poster: Best Model Identification: A Rested Bandit Formulation »
Leonardo Cella · Massimiliano Pontil · Claudio Gentile -
2021 Spotlight: Best Model Identification: A Rested Bandit Formulation »
Leonardo Cella · Massimiliano Pontil · Claudio Gentile -
2020 Poster: On Thompson Sampling with Langevin Algorithms »
Eric Mazumdar · Aldo Pacchiano · Yian Ma · Michael Jordan · Peter Bartlett -
2020 Poster: Accelerated Message Passing for Entropy-Regularized MAP Inference »
Jonathan Lee · Aldo Pacchiano · Peter Bartlett · Michael Jordan -
2020 Poster: Parameter-free, Dynamic, and Strongly-Adaptive Online Learning »
Ashok Cutkosky -
2020 Poster: Momentum Improves Normalized SGD »
Ashok Cutkosky · Harsh Mehta -
2020 Poster: Online Learning with Imperfect Hints »
Aditya Bhaskara · Ashok Cutkosky · Ravi Kumar · Manish Purohit -
2020 Poster: Stochastic Flows and Geometric Optimization on the Orthogonal Group »
Krzysztof Choromanski · David Cheikhi · Jared Quincy Davis · Valerii Likhosherstov · Achille Nazaret · Achraf Bahamou · Xingyou Song · Mrugank Akarte · Jack Parker-Holder · Jacob Bergquist · Yuan Gao · Aldo Pacchiano · Tamas Sarlos · Adrian Weller · Vikas Sindhwani -
2020 Poster: Adaptive Region-Based Active Learning »
Corinna Cortes · Giulia DeSalvo · Claudio Gentile · Mehryar Mohri · Ningshan Zhang -
2020 Poster: Online Learning with Dependent Stochastic Feedback Graphs »
Corinna Cortes · Giulia DeSalvo · Claudio Gentile · Mehryar Mohri · Ningshan Zhang -
2020 Poster: Learning to Score Behaviors for Guided Policy Optimization »
Aldo Pacchiano · Jack Parker-Holder · Yunhao Tang · Krzysztof Choromanski · Anna Choromanska · Michael Jordan -
2020 Poster: Ready Policy One: World Building Through Active Learning »
Philip Ball · Jack Parker-Holder · Aldo Pacchiano · Krzysztof Choromanski · Stephen Roberts -
2020 Tutorial: Parameter-free Online Optimization »
Francesco Orabona · Ashok Cutkosky -
2019 Poster: Matrix-Free Preconditioning in Online Learning »
Ashok Cutkosky · Tamas Sarlos -
2019 Poster: Anytime Online-to-Batch, Optimism and Acceleration »
Ashok Cutkosky -
2019 Poster: Online Learning with Sleeping Experts and Feedback Graphs »
Corinna Cortes · Giulia DeSalvo · Claudio Gentile · Mehryar Mohri · Scott Yang -
2019 Oral: Anytime Online-to-Batch, Optimism and Acceleration »
Ashok Cutkosky -
2019 Oral: Online Learning with Sleeping Experts and Feedback Graphs »
Corinna Cortes · Giulia DeSalvo · Claudio Gentile · Mehryar Mohri · Scott Yang -
2019 Oral: Matrix-Free Preconditioning in Online Learning »
Ashok Cutkosky · Tamas Sarlos -
2019 Poster: Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization »
zhenxun zhuang · Ashok Cutkosky · Francesco Orabona -
2019 Poster: Active Learning with Disagreement Graphs »
Corinna Cortes · Giulia DeSalvo · Mehryar Mohri · Ningshan Zhang · Claudio Gentile -
2019 Poster: Hiring Under Uncertainty »
Manish Purohit · Sreenivas Gollapudi · Manish Raghavan -
2019 Oral: Active Learning with Disagreement Graphs »
Corinna Cortes · Giulia DeSalvo · Mehryar Mohri · Ningshan Zhang · Claudio Gentile -
2019 Oral: Hiring Under Uncertainty »
Manish Purohit · Sreenivas Gollapudi · Manish Raghavan -
2019 Oral: Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization »
zhenxun zhuang · Ashok Cutkosky · Francesco Orabona -
2019 Poster: Policy Certificates: Towards Accountable Reinforcement Learning »
Christoph Dann · Lihong Li · Wei Wei · Emma Brunskill -
2019 Poster: Online learning with kernel losses »
Niladri Chatterji · Aldo Pacchiano · Peter Bartlett -
2019 Oral: Policy Certificates: Towards Accountable Reinforcement Learning »
Christoph Dann · Lihong Li · Wei Wei · Emma Brunskill -
2019 Oral: Online learning with kernel losses »
Niladri Chatterji · Aldo Pacchiano · Peter Bartlett -
2018 Poster: Online Learning with Abstention »
Corinna Cortes · Giulia DeSalvo · Claudio Gentile · Mehryar Mohri · Scott Yang -
2018 Oral: Online Learning with Abstention »
Corinna Cortes · Giulia DeSalvo · Claudio Gentile · Mehryar Mohri · Scott Yang -
2018 Poster: Decoupling Gradient-Like Learning Rules from Representations »
Philip Thomas · Christoph Dann · Emma Brunskill -
2018 Oral: Decoupling Gradient-Like Learning Rules from Representations »
Philip Thomas · Christoph Dann · Emma Brunskill