Timezone: »
Near-Optimal Offline Reinforcement Learning via Double Variance Reduction
Ming Yin · Yu Bai · Yu-Xiang Wang
We consider the problem of offline reinforcement learning (RL) --- a well-motivated setting of RL that aims at policy optimization using only historical data. In this paper, we propose \emph{Off-Policy Double Variance Reduction} (OPDVR), a new variance reduction based algorithm for offline RL. Our main result shows that OPDVR provably identifies an $\epsilon$-optimal policy with $\widetilde{O}(H^2/d_m\epsilon^2)$ episodes of offline data in the finite-horizon \emph{stationary transition} setting and this improves over the best known upper bound by a factor of $H$. Moreover, we establish an information-theoretic lower bound of $\Omega(H^2/d_m\epsilon^2)$ which certifies that OPDVR is optimal up to logarithmic factors.
Lastly, we show that OPDVR also achieves rate-optimal sample complexity under alternative settings such as the finite-horizon MDPs with non-stationary transitions and the infinite horizon MDPs with discounted rewards.
Author Information
Ming Yin (UC Santa Barbara)
Yu Bai (Salesforce Research)
Yu-Xiang Wang (UC Santa Barbara)

Yu-Xiang Wang is the Eugene Aas Assistant Professor of Computer Science at UCSB. He runs the Statistical Machine Learning lab and co-founded the UCSB Center for Responsible Machine Learning. He is also visiting Amazon Web Services. Yu-Xiang’s research interests include statistical theory and methodology, differential privacy, reinforcement learning, online learning and deep learning.
More from the Same Authors
-
2021 : Privately Publishable Per-instance Privacy: An Extended Abstract »
Rachel Redberg · Yu-Xiang Wang -
2021 : Optimal Accounting of Differential Privacy via Characteristic Function »
Yuqing Zhu · Jinshuo Dong · Yu-Xiang Wang -
2021 : Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings »
Ming Yin · Yu-Xiang Wang -
2021 : Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning »
Tengyang Xie · Nan Jiang · Huan Wang · Caiming Xiong · Yu Bai -
2021 : Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games »
Yu Bai · Chi Jin · Huan Wang · Caiming Xiong -
2022 : Optimal Dynamic Regret in LQR Control »
Dheeraj Baby · Yu-Xiang Wang -
2023 : A Privacy-Friendly Approach to Data Valuation »
Jiachen Wang · Yuqing Zhu · Yu-Xiang Wang · Ruoxi Jia · Prateek Mittal -
2023 : Why Quantization Improves Generalization: NTK of Binary Weight Neural Network »
Kaiqi Zhang · Ming Yin · Yu-Xiang Wang -
2023 : Generative Autoencoders as Watermark Attackers: Analyses of Vulnerabilities and Threats »
Xuandong Zhao · Kexun Zhang · Yu-Xiang Wang · Lei Li -
2023 : Provable Robust Watermarking for AI-Generated Text »
Xuandong Zhao · Prabhanjan Ananth · Lei Li · Yu-Xiang Wang -
2023 Poster: Offline Reinforcement Learning with Closed-Form Policy Improvement Operators »
Jiachen Li · Edwin Zhang · Ming Yin · Jerry Bai · Yu-Xiang Wang · William Wang -
2023 Poster: Protecting Language Generation Models via Invisible Watermarking »
Xuandong Zhao · Yu-Xiang Wang · Lei Li -
2023 Poster: Differentially Private Optimization on Large Model at Small Cost »
Zhiqi Bu · Yu-Xiang Wang · Sheng Zha · George Karypis -
2023 Poster: Non-stationary Reinforcement Learning under General Function Approximation »
Songtao Feng · Ming Yin · Ruiquan Huang · Yu-Xiang Wang · Jing Yang · Yingbin LIANG -
2023 Poster: Global Optimization with Parametric Function Approximation »
Chong Liu · Yu-Xiang Wang -
2023 Expo Talk Panel: Generative AI and Science »
Srinivasan Sengamedu · Joseph Sirosh · Yu-Xiang Wang · Alexander Amini -
2022 Poster: Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost »
Dan Qiao · Ming Yin · Ming Min · Yu-Xiang Wang -
2022 Spotlight: Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost »
Dan Qiao · Ming Yin · Ming Min · Yu-Xiang Wang -
2022 Poster: Near-Optimal Learning of Extensive-Form Games with Imperfect Information »
Yu Bai · Chi Jin · Song Mei · Tiancheng Yu -
2022 Spotlight: Near-Optimal Learning of Extensive-Form Games with Imperfect Information »
Yu Bai · Chi Jin · Song Mei · Tiancheng Yu -
2021 : Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games »
Yu Bai · Chi Jin · Huan Wang · Caiming Xiong -
2021 Poster: A Sharp Analysis of Model-based Reinforcement Learning with Self-Play »
Qinghua Liu · Tiancheng Yu · Yu Bai · Chi Jin -
2021 Poster: How Important is the Train-Validation Split in Meta-Learning? »
Yu Bai · Minshuo Chen · Pan Zhou · Tuo Zhao · Jason Lee · Sham Kakade · Huan Wang · Caiming Xiong -
2021 Spotlight: A Sharp Analysis of Model-based Reinforcement Learning with Self-Play »
Qinghua Liu · Tiancheng Yu · Yu Bai · Chi Jin -
2021 Spotlight: How Important is the Train-Validation Split in Meta-Learning? »
Yu Bai · Minshuo Chen · Pan Zhou · Tuo Zhao · Jason Lee · Sham Kakade · Huan Wang · Caiming Xiong -
2021 Poster: Don’t Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification »
Yu Bai · Song Mei · Huan Wang · Caiming Xiong -
2021 Poster: Exact Gap between Generalization Error and Uniform Convergence in Random Feature Models »
Zitong Yang · Yu Bai · Song Mei -
2021 Spotlight: Exact Gap between Generalization Error and Uniform Convergence in Random Feature Models »
Zitong Yang · Yu Bai · Song Mei -
2021 Spotlight: Don’t Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification »
Yu Bai · Song Mei · Huan Wang · Caiming Xiong -
2020 Poster: Provable Self-Play Algorithms for Competitive Reinforcement Learning »
Yu Bai · Chi Jin -
2020 Poster: An end-to-end Differentially Private Latent Dirichlet Allocation Using a Spectral Algorithm »
Christopher DeCarolis · Mukul A Ram · Seyed Esmaeili · Yu-Xiang Wang · Furong Huang -
2019 Poster: Poission Subsampled R\'enyi Differential Privacy »
Yuqing Zhu · Yu-Xiang Wang -
2019 Oral: Poission Subsampled R\'enyi Differential Privacy »
Yuqing Zhu · Yu-Xiang Wang -
2018 Poster: Detecting and Correcting for Label Shift with Black Box Predictors »
Zachary Lipton · Yu-Xiang Wang · Alexander Smola -
2018 Oral: Detecting and Correcting for Label Shift with Black Box Predictors »
Zachary Lipton · Yu-Xiang Wang · Alexander Smola -
2018 Poster: Improving the Gaussian Mechanism for Differential Privacy: Analytical Calibration and Optimal Denoising »
Borja de Balle Pigem · Yu-Xiang Wang -
2018 Oral: Improving the Gaussian Mechanism for Differential Privacy: Analytical Calibration and Optimal Denoising »
Borja de Balle Pigem · Yu-Xiang Wang -
2018 Poster: signSGD: Compressed Optimisation for Non-Convex Problems »
Jeremy Bernstein · Yu-Xiang Wang · Kamyar Azizzadenesheli · Anima Anandkumar -
2018 Oral: signSGD: Compressed Optimisation for Non-Convex Problems »
Jeremy Bernstein · Yu-Xiang Wang · Kamyar Azizzadenesheli · Anima Anandkumar -
2017 Poster: Optimal and Adaptive Off-policy Evaluation in Contextual Bandits »
Yu-Xiang Wang · Alekh Agarwal · Miroslav Dudik -
2017 Talk: Optimal and Adaptive Off-policy Evaluation in Contextual Bandits »
Yu-Xiang Wang · Alekh Agarwal · Miroslav Dudik