Timezone: »

Doubly robust off-policy evaluation with shrinkage
Yi Su · Maria Dimakopoulou · Akshay Krishnamurthy · Miroslav Dudik

Thu Jul 16 07:00 AM -- 07:45 AM & Thu Jul 16 08:00 PM -- 08:45 PM (PDT) @ None #None

We propose a new framework for designing estimators for off-policy evaluation in contextual bandits. Our approach is based on the asymptotically optimal doubly robust estimator, but we shrink the importance weights to minimize a bound on the mean squared error, which results in a better bias-variance tradeoff in finite samples. We use this optimization-based framework to obtain three estimators: (a) a weight-clipping estimator, (b) a new weight-shrinkage estimator, and (c) the first shrinkage-based estimator for combinatorial action sets. Extensive experiments in both standard and combinatorial bandit benchmark problems show that our estimators are highly adaptive and typically outperform state-of-the-art methods.

Author Information

Yi Su (Cornell University)
Maria Dimakopoulou (Stanford University)
Akshay Krishnamurthy (Microsoft Research)
Miroslav Dudik (Microsoft Research)

More from the Same Authors