Deeply-Debiased Off-Policy Interval Estimation

Chengchun Shi · Runzhe Wan · Victor Chernozhukov · Rui Song

Keywords: [ Learning Theory ] [ Reinforcement Learning and Planning ]

[ Abstract ]
[ Paper ]
[ Visit Poster at Spot B4 in Virtual World ]
Tue 20 Jul 9 a.m. PDT — 11 a.m. PDT
Oral presentation: Deep Reinforcement Learning 2
Tue 20 Jul 5 a.m. PDT — 6 a.m. PDT


Off-policy evaluation learns a target policy's value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel procedure to construct an efficient, robust, and flexible CI on a target policy's value. Our method is justified by theoretical results and numerical experiments. A Python implementation of the proposed procedure is available at RunzheStat/D2OPE.

Chat is not available.