Timezone: »
Reinforcement learning (RL) is a powerful technique that allows an autonomous agent to learn an optimal policy to maximize the expected return. The optimality of various RL algorithms relies on the stationarity assumption, which requires time-invariant state transition and reward functions. However, deviations from stationarity over extended periods often occur in real-world applications like robotics control, health care and digital marketing, resulting in suboptimal policies learned under stationary assumptions. In this paper, we propose a model-based doubly robust procedure for testing the stationarity assumption and detecting change points in offline RL settings with certain degree of homogeneity. Our proposed testing procedure is robust to model misspecifications and can effectively control type-I error while achieving high statistical power, especially in high-dimensional settings. Extensive comparative simulations and a real-world interventional mobile health example illustrate the advantages of our method in detecting change points and optimizing long-term rewards in high-dimensional, non-stationary environments.
Author Information
Jitao Wang (University of Michigan)
Chengchun Shi (London School of Economics and Political Science)
Zhenke Wu (University of Michigan, Ann Arbor)

Zhenke Wu’s research involves the development of statistical methods that inform health decisions made by individuals. He is particularly interested in scalable Bayesian methods that integrate multiple sources of evidence, with a focus on hierarchical latent variable modeling. He also works on sequential decision making by developing new statistical tools for reinforcement learning and micro-randomized trials. He has developed methods to estimate the etiology of childhood pneumonia, cause-of-death distributions using verbal autospy, autoantibody signatures for subsetting autoimmune disease patients, and to estimate time-varying causal effects of mobile prompts upon lagged physical, mental and behavioral health outcomes. Zhenke has developed original methods and software that are now used by investigators from research institutes such as US CDC and Johns Hopkins, as well as site investigators from developing countries, e.g., Kenya, South Africa, Gambia, Mali, Zambia, Thailand and Bangladesh. Zhenke completed a BS in Math at Fudan University in 2009 and a PhD in Biostatistics from the Johns Hopkins University in 2014 and then stayed at Hopkins for his postdoctoral training. Since 2016, Zhenke is Assistant Professor of Biostatistics, and Research Assistant Professor in Michigan Institute for Data Science (MIDAS) at University of Michigan, Ann Arbor. When not thinking about Statistics, you can often find me playing basketball, running, rock climbing, hiking, or downhill skiing.
More from the Same Authors
-
2023 Poster: An Instrumental Variable Approach to Confounded Off-Policy Evaluation »
Yang Xu · Jin Zhu · Chengchun Shi · Shikai Luo · Rui Song -
2023 Poster: A Reinforcement Learning Framework for Dynamic Mediation Analysis »
Lin Ge · Jitao Wang · Chengchun Shi · Zhenke Wu · Rui Song -
2022 Poster: A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes »
Chengchun Shi · Masatoshi Uehara · Jiawei Huang · Nan Jiang -
2022 Oral: A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes »
Chengchun Shi · Masatoshi Uehara · Jiawei Huang · Nan Jiang -
2021 Poster: Deeply-Debiased Off-Policy Interval Estimation »
Chengchun Shi · Runzhe Wan · Victor Chernozhukov · Rui Song -
2021 Oral: Deeply-Debiased Off-Policy Interval Estimation »
Chengchun Shi · Runzhe Wan · Victor Chernozhukov · Rui Song -
2020 Poster: Does the Markov Decision Process Fit the Data: Testing for the Markov Property in Sequential Decision Making »
Chengchun Shi · Runzhe Wan · Rui Song · Wenbin Lu · Ling Leng