An Instrumental Variable Approach to Confounded Off-Policy Evaluation
Yang Xu · Jin Zhu · Chengchun Shi · Shikai Luo · Rui Song

Tue Jul 25 05:00 PM -- 06:30 PM (PDT) @ Exhibit Hall 1 #605

Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In many cases, there exist unmeasured variables that confound the action-reward or action-next-state relationships, rendering many existing OPE approaches ineffective. This paper develops an instrumental variable (IV)-based method for consistent OPE in confounded sequential decision making. Similar to single-stage decision making, we show that IV enables us to correctly identify the target policy's value in infinite horizon settings as well. Furthermore, we propose a number of policy value estimators and illustrate their effectiveness through extensive simulations and real data analysis from a world-leading short-video platform.

Author Information

Yang Xu (North Carolina State University)
Jin Zhu (Sun Yat-Sen University)
Chengchun Shi (London School of Economics and Political Science)
Shikai Luo (Bytedance)
Rui Song (Amazon Inc)

