Timezone: »
Contextual bandits are a learning protocol that encompasses applications such as news recommendation, advertising, and mobile health, where an algorithm repeatedly observes some information about a user, makes a decision what content to present, and accrues a reward if the presented content is successful. In this talk, I will focus on the fundamental task of evaluating a new policy given historic data. I will describe the asymptotically optimal approach of doubly robust (DR) estimation, the reasons for its shortcomings in finite samples, and how to overcome these shortcomings by directly optimizing the bound on finite-sample error. Optimization yields a new family of estimators that, similarly to DR, leverage any direct model of rewards, but shrink importance weights to obtain a better bias-variance tradeoff than DR. Error bounds can also be used to select the best among multiple reward predictors. Somewhat surprisingly, reward predictors that work best with standard DR are not the same as those that work best with our modified DR. Our new estimator and model selection procedure perform extremely well across a wide variety of settings, so we expect they will enjoy broad practical use.
Based on joint work with Yi Su, Maria Dimakopoulou, and Akshay Krishnamurthy.
Author Information
Miroslav Dudik (Microsoft Research)

Miroslav Dudík is a Senior Principal Researcher in machine learning at Microsoft Research, NYC. His research focuses on combining theoretical and applied aspects of machine learning, statistics, convex optimization, and algorithms. Most recently he has worked on contextual bandits, reinforcement learning, and algorithmic fairness. He received his PhD from Princeton in 2007. He is a co-creator of the Fairlearn toolkit for assessing and improving the fairness of machine learning models and of the Maxent package for modeling species distributions, which is used by biologists around the world to design national parks, model the impacts of climate change, and discover new species.
More from the Same Authors
-
2021 : Provably efficient exploration-free transfer RL for near-deterministic latent dynamics »
Yao Liu · Dipendra Misra · Miroslav Dudik · Robert Schapire -
2022 Poster: Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning »
Alberto Bietti · Chen-Yu Wei · Miroslav Dudik · John Langford · Steven Wu -
2022 Spotlight: Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning »
Alberto Bietti · Chen-Yu Wei · Miroslav Dudik · John Langford · Steven Wu -
2021 Poster: Interactive Learning from Activity Description »
Khanh Nguyen · Dipendra Misra · Robert Schapire · Miroslav Dudik · Patrick Shafto -
2021 Spotlight: Interactive Learning from Activity Description »
Khanh Nguyen · Dipendra Misra · Robert Schapire · Miroslav Dudik · Patrick Shafto -
2020 Poster: Doubly robust off-policy evaluation with shrinkage »
Yi Su · Maria Dimakopoulou · Akshay Krishnamurthy · Miroslav Dudik -
2019 Poster: Fair Regression: Quantitative Definitions and Reduction-Based Algorithms »
Alekh Agarwal · Miroslav Dudik · Steven Wu -
2019 Oral: Fair Regression: Quantitative Definitions and Reduction-Based Algorithms »
Alekh Agarwal · Miroslav Dudik · Steven Wu -
2019 Poster: Provably efficient RL with Rich Observations via Latent State Decoding »
Simon Du · Akshay Krishnamurthy · Nan Jiang · Alekh Agarwal · Miroslav Dudik · John Langford -
2019 Oral: Provably efficient RL with Rich Observations via Latent State Decoding »
Simon Du · Akshay Krishnamurthy · Nan Jiang · Alekh Agarwal · Miroslav Dudik · John Langford -
2018 Poster: Hierarchical Imitation and Reinforcement Learning »
Hoang Le · Nan Jiang · Alekh Agarwal · Miroslav Dudik · Yisong Yue · Hal Daumé III -
2018 Poster: A Reductions Approach to Fair Classification »
Alekh Agarwal · Alina Beygelzimer · Miroslav Dudik · John Langford · Hanna Wallach -
2018 Oral: Hierarchical Imitation and Reinforcement Learning »
Hoang Le · Nan Jiang · Alekh Agarwal · Miroslav Dudik · Yisong Yue · Hal Daumé III -
2018 Oral: A Reductions Approach to Fair Classification »
Alekh Agarwal · Alina Beygelzimer · Miroslav Dudik · John Langford · Hanna Wallach -
2018 Poster: Practical Contextual Bandits with Regression Oracles »
Dylan Foster · Alekh Agarwal · Miroslav Dudik · Haipeng Luo · Robert Schapire -
2018 Oral: Practical Contextual Bandits with Regression Oracles »
Dylan Foster · Alekh Agarwal · Miroslav Dudik · Haipeng Luo · Robert Schapire -
2017 Poster: Optimal and Adaptive Off-policy Evaluation in Contextual Bandits »
Yu-Xiang Wang · Alekh Agarwal · Miroslav Dudik -
2017 Talk: Optimal and Adaptive Off-policy Evaluation in Contextual Bandits »
Yu-Xiang Wang · Alekh Agarwal · Miroslav Dudik