Timezone: »
For infinite action contextual bandits, smoothed regret and reduction to regression results in state-of-the-art online performance with computational cost independent of the action set: unfortunately, the resulting data exhaust does not have well-defined importance-weights. This frustrates the execution of downstream data science processes such as offline model selection. In this paper we describe an online algorithm with an equivalent smoothed regret guarantee, but which generates well-defined importance weights: in exchange, the online computational cost increases, but only to order smoothness (i.e., still independent of the action set). This removes a key obstacle to adoption of smoothed regret in production scenarios.
Author Information
Mark Rucker (University of Virginia)
A strong research professional working towards a PhD at the University of Virginia with a focus on reinforcement learning, statistical estimation and human behavior modeling. Project based experience in contextual bandits, inverse reinforcement learning, MDP design, experiment design, kernel-based function approximation, Python, Scala, Spark, MATLAB, CVX, R, MongoDB, DynamoDB, JavaScript, S3, CloudFront, AWS Lambda, EC2, SQL server and C#.
Yinglun Zhu (University of California, Riverside)
Paul Mineiro (Microsoft)
More from the Same Authors
-
2022 : Interaction-Grounded Learning with Action-inclusive Feedback »
Tengyang Xie · Akanksha Saran · Dylan Foster · Lekan Molu · Ida Momennejad · Nan Jiang · Paul Mineiro · John Langford -
2023 : Time-uniform confidence bands for the CDF under nonstationarity »
Paul Mineiro · Steve Howard -
2023 : LabelBench: A Comprehensive Framework for Benchmarking Label-Efficient Learning »
Jifan Zhang · Yifang Chen · Gregory Canal · Stephen Mussmann · Yinglun Zhu · Simon Du · Kevin Jamieson · Robert Nowak -
2022 Poster: Contextual Bandits with Large Action Spaces: Made Practical »
Yinglun Zhu · Dylan Foster · John Langford · Paul Mineiro -
2022 Spotlight: Contextual Bandits with Large Action Spaces: Made Practical »
Yinglun Zhu · Dylan Foster · John Langford · Paul Mineiro -
2022 Poster: Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces »
Yinglun Zhu · Paul Mineiro -
2022 Oral: Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces »
Yinglun Zhu · Paul Mineiro -
2022 : What is new in Vowpal Wabbit 9 »
Eduardo Salinas · Mark Rucker · Zakaria Mhammedi -
2021 Poster: Off-Policy Confidence Sequences »
Nikos Karampatziakis · Paul Mineiro · Aaditya Ramdas -
2021 Spotlight: Off-Policy Confidence Sequences »
Nikos Karampatziakis · Paul Mineiro · Aaditya Ramdas -
2021 Poster: Interaction-Grounded Learning »
Tengyang Xie · John Langford · Paul Mineiro · Ida Momennejad -
2021 Spotlight: Interaction-Grounded Learning »
Tengyang Xie · John Langford · Paul Mineiro · Ida Momennejad -
2021 Poster: ChaCha for Online AutoML »
Qingyun Wu · Chi Wang · John Langford · Paul Mineiro · Marco Rossi -
2021 Spotlight: ChaCha for Online AutoML »
Qingyun Wu · Chi Wang · John Langford · Paul Mineiro · Marco Rossi -
2021 : COBA »
Mark Rucker -
2020 Poster: Robust Outlier Arm Identification »
Yinglun Zhu · Sumeet Katariya · Robert Nowak -
2019 Poster: Contextual Memory Trees »
Wen Sun · Alina Beygelzimer · Hal Daumé III · John Langford · Paul Mineiro -
2019 Oral: Contextual Memory Trees »
Wen Sun · Alina Beygelzimer · Hal Daumé III · John Langford · Paul Mineiro