Skip to yearly menu bar Skip to main content


Accountable Off-Policy Evaluation With Kernel Bellman Statistics

Yihao Feng · Tongzheng Ren · Ziyang Tang · Qiang Liu

Keywords: [ Reinforcement Learning ] [ Accountability, Transparency and Interpretability ] [ Safety ] [ Reinforcement Learning - General ]


Off-policy evaluation plays an important role in modern reinforcement learning. However, most existing off-policy evaluation algorithms focus on point estimation, without providing an account- able confidence interval, that can reflect the uncertainty caused by limited observed data and algorithmic errors. In this work, we propose a new optimization-based framework, which can find a feasible set that contains the true value function with high probability, by leveraging the statistical properties of the recent proposed kernel Bellman loss (Feng et al., 2019). We further utilize the feasible set to construct accountable confidence intervals for off-policy evaluations, and propose a post-hoc diagnosis for existing estimators. Empirical results show that our methods yield tight yet accountable confidence intervals in different settings, which demonstrate the effectiveness of our method.

Chat is not available.