Timezone: »
Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In many cases, there exist unmeasured variables that confound the action-reward or action-next-state relationships, rendering many existing OPE approaches ineffective. This paper develops an instrumental variable (IV)-based method for consistent OPE in confounded sequential decision making. Similar to single-stage decision making, we show that IV enables us to correctly identify the target policy's value in infinite horizon settings as well. Furthermore, we propose a number of policy value estimators and illustrate their effectiveness through extensive simulations and real data analysis from a world-leading short-video platform.
Author Information
Yang Xu (North Carolina State University)
Jin Zhu (Sun Yat-Sen University)
Chengchun Shi (London School of Economics and Political Science)
Shikai Luo (Bytedance)
Rui Song (Amazon Inc)
More from the Same Authors
-
2023 Poster: Multiplier Bootstrap-based Exploration »
Runzhe Wan · Haoyu Wei · Branislav Kveton · Rui Song -
2023 Poster: abess: A Fast Best-Subset Selection Library in Python and R »
Jin Zhu · Xueqin Wang · Liyuan Hu · Junhao Huang · Kangkang Jiang · Yanhang Zhang · Shiyun Lin · Junxian Zhu -
2023 Poster: A Robust Test for the Stationarity Assumption in Sequential Decision Making »
Jitao Wang · Chengchun Shi · Zhenke Wu -
2023 Poster: On Heterogeneous Treatment Effects in Heterogeneous Causal Graphs »
Richard Watson · Hengrui Cai · Xinming An · Samuel McLean · Rui Song -
2023 Poster: A Reinforcement Learning Framework for Dynamic Mediation Analysis »
Lin Ge · Jitao Wang · Chengchun Shi · Zhenke Wu · Rui Song -
2022 Poster: Safe Exploration for Efficient Policy Evaluation and Comparison »
Runzhe Wan · Branislav Kveton · Rui Song -
2022 Spotlight: Safe Exploration for Efficient Policy Evaluation and Comparison »
Runzhe Wan · Branislav Kveton · Rui Song -
2022 Poster: A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes »
Chengchun Shi · Masatoshi Uehara · Jiawei Huang · Nan Jiang -
2022 Oral: A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes »
Chengchun Shi · Masatoshi Uehara · Jiawei Huang · Nan Jiang -
2021 Poster: Deeply-Debiased Off-Policy Interval Estimation »
Chengchun Shi · Runzhe Wan · Victor Chernozhukov · Rui Song -
2021 Oral: Deeply-Debiased Off-Policy Interval Estimation »
Chengchun Shi · Runzhe Wan · Victor Chernozhukov · Rui Song -
2020 Poster: Does the Markov Decision Process Fit the Data: Testing for the Markov Property in Sequential Decision Making »
Chengchun Shi · Runzhe Wan · Rui Song · Wenbin Lu · Ling Leng