Timezone: »

 
Miro Dudík (Microsoft Research) - Doubly Robust Off-policy Evaluation with Shrinkage
Miro Dudik
Event URL: https://www.microsoft.com/en-us/research/people/mdudik/ »

Contextual bandits are a learning protocol that encompasses applications such as news recommendation, advertising, and mobile health, where an algorithm repeatedly observes some information about a user, makes a decision what content to present, and accrues a reward if the presented content is successful. In this talk, I will focus on the fundamental task of evaluating a new policy given historic data. I will describe the asymptotically optimal approach of doubly robust (DR) estimation, the reasons for its shortcomings in finite samples, and how to overcome these shortcomings by directly optimizing the bound on finite-sample error. Optimization yields a new family of estimators that, similarly to DR, leverage any direct model of rewards, but shrink importance weights to obtain a better bias-variance tradeoff than DR. Error bounds can also be used to select the best among multiple reward predictors. Somewhat surprisingly, reward predictors that work best with standard DR are not the same as those that work best with our modified DR. Our new estimator and model selection procedure perform extremely well across a wide variety of settings, so we expect they will enjoy broad practical use.

Based on joint work with Yi Su, Maria Dimakopoulou, and Akshay Krishnamurthy.

Author Information

Miro Dudik (Microsoft Research)
Miro Dudik

Miroslav Dudík is a Senior Principal Researcher in machine learning at Microsoft Research, NYC. His research focuses on combining theoretical and applied aspects of machine learning, statistics, convex optimization, and algorithms. Most recently he has worked on contextual bandits, reinforcement learning, and algorithmic fairness. He received his PhD from Princeton in 2007. He is a co-creator of the Fairlearn toolkit for assessing and improving the fairness of machine learning models and of the Maxentpackage for modeling species distributions, which is used by biologists around the world to design national parks, model the impacts of climate change, and discover new species.

More from the Same Authors