Skip to yearly menu bar Skip to main content


An Instrumental Variable Approach to Confounded Off-Policy Evaluation

Yang Xu · Jin Zhu · Chengchun Shi · Shikai Luo · Rui Song

Exhibit Hall 1 #605
[ ]
[ PDF [ Poster


Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In many cases, there exist unmeasured variables that confound the action-reward or action-next-state relationships, rendering many existing OPE approaches ineffective. This paper develops an instrumental variable (IV)-based method for consistent OPE in confounded sequential decision making. Similar to single-stage decision making, we show that IV enables us to correctly identify the target policy's value in infinite horizon settings as well. Furthermore, we propose a number of policy value estimators and illustrate their effectiveness through extensive simulations and real data analysis from a world-leading short-video platform.

Chat is not available.