Timezone: »
Oral
On the Design of Estimators for Bandit Off-Policy Evaluation
Nikos Vlassis · Aurelien Bibaut · Maria Dimakopoulou · Tony Jebara
Off-policy evaluation is the problem of estimating the value of a target policy using data collected under a different policy. Given a base estimator for bandit off-policy evaluation and a parametrized class of control variates, we address the problem of computing a control variate in that class that reduces the risk of the base estimator. We derive the population risk as a function of the class parameters and we establish conditions that guarantee risk improvement. We present our main results in the context of multi-armed bandits, and we propose a simple design for contextual bandits that gives rise to an estimator that is shown to perform well in multi-class cost-sensitive classification datasets.
Author Information
Nikos Vlassis (Netflix)
Aurelien Bibaut (UC Berkeley)
Maria Dimakopoulou (Netflix)
Tony Jebara (Netflix)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: On the Design of Estimators for Bandit Off-Policy Evaluation »
Thu Jun 13th 01:30 -- 04:00 AM Room Pacific Ballroom
More from the Same Authors
-
2019 Poster: More Efficient Off-Policy Evaluation through Regularized Targeted Learning »
Aurelien Bibaut · Ivana Malenica · Nikos Vlassis · Mark van der Laan -
2019 Oral: More Efficient Off-Policy Evaluation through Regularized Targeted Learning »
Aurelien Bibaut · Ivana Malenica · Nikos Vlassis · Mark van der Laan -
2018 Poster: Coordinated Exploration in Concurrent Reinforcement Learning »
Maria Dimakopoulou · Benjamin Van Roy -
2018 Oral: Coordinated Exploration in Concurrent Reinforcement Learning »
Maria Dimakopoulou · Benjamin Van Roy