Doubly robust off-policy evaluation with shrinkage
Yi Su · Maria Dimakopoulou · Akshay Krishnamurthy · Miro Dudik

Thu Jul 16 07:00 AM -- 07:45 AM & Thu Jul 16 08:00 PM -- 08:45 PM (PDT) @ None #None

We propose a new framework for designing estimators for off-policy evaluation in contextual bandits. Our approach is based on the asymptotically optimal doubly robust estimator, but we shrink the importance weights to minimize a bound on the mean squared error, which results in a better bias-variance tradeoff in finite samples. We use this optimization-based framework to obtain three estimators: (a) a weight-clipping estimator, (b) a new weight-shrinkage estimator, and (c) the first shrinkage-based estimator for combinatorial action sets. Extensive experiments in both standard and combinatorial bandit benchmark problems show that our estimators are highly adaptive and typically outperform state-of-the-art methods.

Author Information

Yi Su (Cornell University)
Maria Dimakopoulou (Stanford University)
Akshay Krishnamurthy (Microsoft Research)
Miro Dudik (Microsoft Research)
Miro Dudik

Miroslav Dudík is a Senior Principal Researcher in machine learning at Microsoft Research, NYC. His research focuses on combining theoretical and applied aspects of machine learning, statistics, convex optimization, and algorithms. Most recently he has worked on contextual bandits, reinforcement learning, and algorithmic fairness. He received his PhD from Princeton in 2007. He is a co-creator of the Fairlearn toolkit for assessing and improving the fairness of machine learning models and of the Maxentpackage for modeling species distributions, which is used by biologists around the world to design national parks, model the impacts of climate change, and discover new species.

