Timezone: »

A Robust Test for the Stationarity Assumption in Sequential Decision Making
Jitao Wang · Chengchun Shi · Zhenke Wu

Tue Jul 25 02:00 PM -- 04:30 PM (PDT) @ Exhibit Hall 1 #514

Reinforcement learning (RL) is a powerful technique that allows an autonomous agent to learn an optimal policy to maximize the expected return. The optimality of various RL algorithms relies on the stationarity assumption, which requires time-invariant state transition and reward functions. However, deviations from stationarity over extended periods often occur in real-world applications like robotics control, health care and digital marketing, resulting in suboptimal policies learned under stationary assumptions. In this paper, we propose a model-based doubly robust procedure for testing the stationarity assumption and detecting change points in offline RL settings with certain degree of homogeneity. Our proposed testing procedure is robust to model misspecifications and can effectively control type-I error while achieving high statistical power, especially in high-dimensional settings. Extensive comparative simulations and a real-world interventional mobile health example illustrate the advantages of our method in detecting change points and optimizing long-term rewards in high-dimensional, non-stationary environments.

Author Information

Jitao Wang (University of Michigan)
Chengchun Shi (London School of Economics and Political Science)
Zhenke Wu (University of Michigan, Ann Arbor)
Zhenke Wu

Zhenke Wu’s research involves the development of statistical methods that inform health decisions made by individuals. He is particularly interested in scalable Bayesian methods that integrate multiple sources of evidence, with a focus on hierarchical latent variable modeling. He also works on sequential decision making by developing new statistical tools for reinforcement learning and micro-randomized trials. He has developed methods to estimate the etiology of childhood pneumonia, cause-of-death distributions using verbal autospy, autoantibody signatures for subsetting autoimmune disease patients, and to estimate time-varying causal effects of mobile prompts upon lagged physical, mental and behavioral health outcomes. Zhenke has developed original methods and software that are now used by investigators from research institutes such as US CDC and Johns Hopkins, as well as site investigators from developing countries, e.g., Kenya, South Africa, Gambia, Mali, Zambia, Thailand and Bangladesh. Zhenke completed a BS in Math at Fudan University in 2009 and a PhD in Biostatistics from the Johns Hopkins University in 2014 and then stayed at Hopkins for his postdoctoral training. Since 2016, Zhenke is Assistant Professor of Biostatistics, and Research Assistant Professor in Michigan Institute for Data Science (MIDAS) at University of Michigan, Ann Arbor. When not thinking about Statistics, you can often find me playing basketball, running, rock climbing, hiking, or downhill skiing.

More from the Same Authors