Timezone: »
Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings
Ming Yin · Yu-Xiang Wang
This work studies the statistical limits of uniform convergence for offline policy evaluation (OPE) problems with model-based methods (for episodic MDP) and provides a unified framework towards optimal learning for several well-motivated offline tasks. We establish an $\Omega(H^2 S/d_m\epsilon^2)$ lower bound (over model-based family) for the global uniform OPE and our main result establishes an upper bound of $\tilde{O}(H^2/d_m\epsilon^2)$ for the \emph{local} uniform convergence. The highlight in achieving the optimal rate $\tilde{O}(H^2/d_m\epsilon^2)$ is our design of \emph{singleton absorbing MDP}, which is a new sharp analysis tool that works with the model-based approach. We generalize such a model-based framework to the new settings: offline task-agnostic and the offline reward-free with optimal complexity $\tilde{O}(H^2\log(K)/d_m\epsilon^2)$ ($K$ is the number of tasks) and $\tilde{O}(H^2S/d_m\epsilon^2)$ respectively. These results provide a unified solution for simultaneously solving different offline RL problems.
Author Information
Ming Yin (UC Santa Barbara)
Yu-Xiang Wang (UC Santa Barbara)

Yu-Xiang Wang is the Eugene Aas Assistant Professor of Computer Science at UCSB. He runs the Statistical Machine Learning lab and co-founded the UCSB Center for Responsible Machine Learning. He is also visiting Amazon Web Services. Yu-Xiang’s research interests include statistical theory and methodology, differential privacy, reinforcement learning, online learning and deep learning.
More from the Same Authors
-
2021 : Privately Publishable Per-instance Privacy: An Extended Abstract »
Rachel Redberg · Yu-Xiang Wang -
2021 : Optimal Accounting of Differential Privacy via Characteristic Function »
Yuqing Zhu · Jinshuo Dong · Yu-Xiang Wang -
2021 : Near-Optimal Offline Reinforcement Learning via Double Variance Reduction »
Ming Yin · Yu Bai · Yu-Xiang Wang -
2022 : Optimal Dynamic Regret in LQR Control »
Dheeraj Baby · Yu-Xiang Wang -
2022 Poster: Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost »
Dan Qiao · Ming Yin · Ming Min · Yu-Xiang Wang -
2022 Spotlight: Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost »
Dan Qiao · Ming Yin · Ming Min · Yu-Xiang Wang -
2020 Poster: An end-to-end Differentially Private Latent Dirichlet Allocation Using a Spectral Algorithm »
Christopher DeCarolis · Mukul A Ram · Seyed Esmaeili · Yu-Xiang Wang · Furong Huang -
2019 Poster: Poission Subsampled R\'enyi Differential Privacy »
Yuqing Zhu · Yu-Xiang Wang -
2019 Oral: Poission Subsampled R\'enyi Differential Privacy »
Yuqing Zhu · Yu-Xiang Wang -
2018 Poster: Detecting and Correcting for Label Shift with Black Box Predictors »
Zachary Lipton · Yu-Xiang Wang · Alexander Smola -
2018 Oral: Detecting and Correcting for Label Shift with Black Box Predictors »
Zachary Lipton · Yu-Xiang Wang · Alexander Smola -
2018 Poster: Improving the Gaussian Mechanism for Differential Privacy: Analytical Calibration and Optimal Denoising »
Borja de Balle Pigem · Yu-Xiang Wang -
2018 Oral: Improving the Gaussian Mechanism for Differential Privacy: Analytical Calibration and Optimal Denoising »
Borja de Balle Pigem · Yu-Xiang Wang -
2018 Poster: signSGD: Compressed Optimisation for Non-Convex Problems »
Jeremy Bernstein · Yu-Xiang Wang · Kamyar Azizzadenesheli · Anima Anandkumar -
2018 Oral: signSGD: Compressed Optimisation for Non-Convex Problems »
Jeremy Bernstein · Yu-Xiang Wang · Kamyar Azizzadenesheli · Anima Anandkumar -
2017 Poster: Optimal and Adaptive Off-policy Evaluation in Contextual Bandits »
Yu-Xiang Wang · Alekh Agarwal · Miroslav Dudik -
2017 Talk: Optimal and Adaptive Off-policy Evaluation in Contextual Bandits »
Yu-Xiang Wang · Alekh Agarwal · Miroslav Dudik