Timezone: »
Designing efficient general-purpose contextual bandit algorithms that work with large---or even infinite---action spaces would facilitate application to important scenarios such as information retrieval, recommendation systems, and continuous control. While obtaining standard regret guarantees can be hopeless, alternative regret notions have been proposed to tackle the large action setting. We propose a smooth regret notion for contextual bandits, which dominates previously proposed alternatives. We design a statistically and computationally efficient algorithm---for the proposed smooth regret---that works with general function approximation under standard supervised oracles. We also present an adaptive algorithm that automatically adapts to any smoothness level. Our algorithms can be used to recover the previous minimax/Pareto optimal guarantees under the standard regret, e.g., in bandit problems with multiple best arms and Lipschitz/H{\"o}lder bandits. We conduct large-scale empirical evaluations demonstrating the efficacy of our proposed algorithms.
Author Information
Yinglun Zhu (University of Wisconsin-Madison)
Paul Mineiro (Microsoft)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces »
Tue. Jul 19th through Wed the 20th Room Hall E #1310
More from the Same Authors
-
2022 : Interaction-Grounded Learning with Action-inclusive Feedback »
Tengyang Xie · Akanksha Saran · Dylan Foster · Lekan Molu · Ida Momennejad · Nan Jiang · Paul Mineiro · John Langford -
2023 : Time-uniform confidence bands for the CDF under nonstationarity »
Paul Mineiro · Steve Howard -
2023 : LabelBench: A Comprehensive Framework for Benchmarking Label-Efficient Learning »
Jifan Zhang · Yifang Chen · Gregory Canal · Stephen Mussmann · Yinglun Zhu · Simon Du · Kevin Jamieson · Robert Nowak -
2023 Poster: Infinite Action Contextual Bandits with Reusable Data Exhaust »
Mark Rucker · Yinglun Zhu · Paul Mineiro -
2022 Poster: Contextual Bandits with Large Action Spaces: Made Practical »
Yinglun Zhu · Dylan Foster · John Langford · Paul Mineiro -
2022 Spotlight: Contextual Bandits with Large Action Spaces: Made Practical »
Yinglun Zhu · Dylan Foster · John Langford · Paul Mineiro -
2021 Poster: Off-Policy Confidence Sequences »
Nikos Karampatziakis · Paul Mineiro · Aaditya Ramdas -
2021 Spotlight: Off-Policy Confidence Sequences »
Nikos Karampatziakis · Paul Mineiro · Aaditya Ramdas -
2021 Poster: Interaction-Grounded Learning »
Tengyang Xie · John Langford · Paul Mineiro · Ida Momennejad -
2021 Spotlight: Interaction-Grounded Learning »
Tengyang Xie · John Langford · Paul Mineiro · Ida Momennejad -
2021 Poster: ChaCha for Online AutoML »
Qingyun Wu · Chi Wang · John Langford · Paul Mineiro · Marco Rossi -
2021 Spotlight: ChaCha for Online AutoML »
Qingyun Wu · Chi Wang · John Langford · Paul Mineiro · Marco Rossi -
2020 Poster: Robust Outlier Arm Identification »
Yinglun Zhu · Sumeet Katariya · Robert Nowak -
2019 Poster: Contextual Memory Trees »
Wen Sun · Alina Beygelzimer · Hal Daumé III · John Langford · Paul Mineiro -
2019 Oral: Contextual Memory Trees »
Wen Sun · Alina Beygelzimer · Hal Daumé III · John Langford · Paul Mineiro