Timezone: »

The Adaptive Doubly Robust Estimator for Policy Evaluation in Adaptive Experiments
Masahiro Kato · Shota Yasui · Kenichiro McAlinn

We consider policy evaluation with dependent samples gathered from adaptive experiments. To deal with the dependency, existing studies, such as van der Laan (2008), proposed estimators including an inverse probability weight, whose score function has a martingale property. However, these estimators require the true logging policy (the probability of choosing an action) for using the martingale property. To mitigate this neglected assumption, we propose the doubly robust (DR) estimator, which consists of two nuisance estimators of the conditional mean outcome and the logging policy, for the dependent samples. To obtain an asymptotically normal semiparametric estimator from dependent samples without Donsker nuisance estimators and martingale property, we propose adaptive-fitting as a variant of sample-splitting proposed by Chernozhukov et al. (2018) for independent and identically distributed samples. We confirm the empirical performance through simulation studies and report that the DR estimator also has a stabilization effect.

Author Information

Masahiro Kato (Cyberagent)
Shota Yasui (Cyberagent)
Kenichiro McAlinn (Temple University)

More from the Same Authors