Poster

On the Design of Estimators for Bandit Off-Policy Evaluation

Nikos Vlassis ⋅ Aurelien Bibaut ⋅ Maria Dimakopoulou ⋅ Tony Jebara

Keywords: Bandits

2019 Poster

Abstract

Off-policy evaluation is the problem of estimating the value of a target policy using data collected under a different policy. Given a base estimator for bandit off-policy evaluation and a parametrized class of control variates, we address the problem of computing a control variate in that class that reduces the risk of the base estimator. We derive the population risk as a function of the class parameters and we establish conditions that guarantee risk improvement. We present our main results in the context of multi-armed bandits, and we propose a simple design for contextual bandits that gives rise to an estimator that is shown to perform well in multi-class cost-sensitive classification datasets.

Chat is not available.