Timezone: »
A fundamental challenge in contextual bandits is to develop flexible, general-purpose algorithms with computational requirements no worse than classical supervised learning tasks such as classification and regression. Algorithms based on regression have shown promising empirical success, but theoretical guarantees have remained elusive except in special cases. We provide the first universal and optimal reduction from contextual bandits to online regression. We show how to transform any oracle for online regression with a given value function class into an algorithm for contextual bandits with the induced policy class, with no overhead in runtime or memory requirements. We characterize the minimax rates for contextual bandits with general, potentially nonparametric function classes, and show that our algorithm is minimax optimal whenever the oracle obtains the optimal rate for regression. Compared to previous results, our algorithm requires no distributional assumptions beyond realizability, and works even when contexts are chosen adversarially.
Author Information
Dylan Foster (MIT)
Alexander Rakhlin (MIT)
More from the Same Authors
-
2022 : Interaction-Grounded Learning with Action-inclusive Feedback »
Tengyang Xie · Akanksha Saran · Dylan Foster · Lekan Molu · Ida Momennejad · Nan Jiang · Paul Mineiro · John Langford -
2022 Poster: Contextual Bandits with Large Action Spaces: Made Practical »
Yinglun Zhu · Dylan Foster · John Langford · Paul Mineiro -
2022 Spotlight: Contextual Bandits with Large Action Spaces: Made Practical »
Yinglun Zhu · Dylan Foster · John Langford · Paul Mineiro -
2022 : Q&A II »
Dylan Foster · Alexander Rakhlin -
2022 : Bridging Learning and Decision Making: Part II »
Dylan Foster -
2022 : Q&A »
Dylan Foster · Alexander Rakhlin -
2022 Tutorial: Bridging Learning and Decision Making »
Dylan Foster · Alexander Rakhlin -
2022 : Bridging Learning and Decision Making: Part I »
Alexander Rakhlin -
2021 Poster: Top-k eXtreme Contextual Bandits with Arm Hierarchy »
Rajat Sen · Alexander Rakhlin · Lexing Ying · Rahul Kidambi · Dean Foster · Daniel Hill · Inderjit Dhillon -
2021 Spotlight: Top-k eXtreme Contextual Bandits with Arm Hierarchy »
Rajat Sen · Alexander Rakhlin · Lexing Ying · Rahul Kidambi · Dean Foster · Daniel Hill · Inderjit Dhillon -
2020 Poster: Naive Exploration is Optimal for Online LQR »
Max Simchowitz · Dylan Foster -
2020 Poster: Improved Bounds on Minimax Regret under Logarithmic Loss via Self-Concordance »
Blair Bilodeau · Dylan Foster · Daniel Roy -
2020 Poster: Logarithmic Regret for Adversarial Online Control »
Dylan Foster · Max Simchowitz -
2019 Poster: Near optimal finite time identification of arbitrary linear dynamical systems »
Tuhin Sarkar · Alexander Rakhlin -
2019 Poster: Distributed Learning with Sublinear Communication »
Jayadev Acharya · Christopher De Sa · Dylan Foster · Karthik Sridharan -
2019 Oral: Near optimal finite time identification of arbitrary linear dynamical systems »
Tuhin Sarkar · Alexander Rakhlin -
2019 Oral: Distributed Learning with Sublinear Communication »
Jayadev Acharya · Christopher De Sa · Dylan Foster · Karthik Sridharan -
2018 Poster: Practical Contextual Bandits with Regression Oracles »
Dylan Foster · Alekh Agarwal · Miroslav Dudik · Haipeng Luo · Robert Schapire -
2018 Oral: Practical Contextual Bandits with Regression Oracles »
Dylan Foster · Alekh Agarwal · Miroslav Dudik · Haipeng Luo · Robert Schapire