Timezone: »
Corruption Robust Offline Reinforcement Learning
Xuezhou Zhang · Yiding Chen · Jerry Zhu · Wen Sun
We study the adversarial robustness in offline reinforcement learning. Given a batch dataset consisting of tuples $(s, a, r, s')$, an adversary is allowed to arbitrarily modify $\epsilon$ fraction of the tuples. From the corrupted dataset the learner aims to robustly identify a near-optimal policy. We first show that a worst-case $\Omega(d\epsilon)$ optimality gap is unavoidable in linear MDP of dimension $d$, even if the adversary only corrupts the reward element in a tuple.
This contrasts with dimension-free results in robust supervised learning and best-known lower-bound in the online RL setting with corruption. Next, we propose a robust variant of the Least-Square Value Iteration (LSVI) algorithm utilizing robust supervised learning oracles, that achieves near-matching performances in cases both with and without full data coverage. The algorithm requires the knowledge of $\epsilon$ to design the pessimism bonus in the no-coverage case. Surprisingly, in this case, the knowledge of $\epsilon$ is necessary, as we show that being adaptive to unknown $\epsilon$ is impossible. This again contrasts with recent results on corruption-robust online RL and implies that robust offline RL is a strictly harder problem.
Author Information
Xuezhou Zhang (UW-Madison)
Yiding Chen (University of Wisconsin-Madison)
Jerry Zhu (University of Wisconsin-Madison)
Wen Sun (Cornell University)
More from the Same Authors
-
2021 : Mitigating Covariate Shift in Imitation Learning via Offline Data Without Great Coverage »
Jonathan Chang · Masatoshi Uehara · Dhruv Sreenivas · Rahul Kidambi · Wen Sun -
2021 : MobILE: Model-Based Imitation Learning From Observation Alone »
Rahul Kidambi · Jonathan Chang · Wen Sun -
2023 : Representation Learning in Low-rank Slate-based Recommender Systems »
Yijia Dai · Wen Sun -
2023 : Provable Offline Reinforcement Learning with Human Feedback »
Wenhao Zhan · Masatoshi Uehara · Nathan Kallus · Jason Lee · Wen Sun -
2023 : Contextual Bandits and Imitation Learning with Preference-Based Active Queries »
Ayush Sekhari · Karthik Sridharan · Wen Sun · Runzhe Wu -
2023 : Selective Sampling and Imitation Learning via Online Regression »
Ayush Sekhari · Karthik Sridharan · Wen Sun · Runzhe Wu -
2023 : Provable Offline Reinforcement Learning with Human Feedback »
Wenhao Zhan · Masatoshi Uehara · Nathan Kallus · Jason Lee · Wen Sun -
2023 : How to Query Human Feedback Efficiently in RL? »
Wenhao Zhan · Masatoshi Uehara · Wen Sun · Jason Lee -
2023 : Contextual Bandits and Imitation Learning with Preference-Based Active Queries »
Ayush Sekhari · Karthik Sridharan · Wen Sun · Runzhe Wu -
2023 : How to Query Human Feedback Efficiently in RL? »
Wenhao Zhan · Masatoshi Uehara · Wen Sun · Jason Lee -
2023 Poster: Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR »
Kaiwen Wang · Nathan Kallus · Wen Sun -
2023 Poster: Multi-task Representation Learning for Pure Exploration in Linear Bandits »
Yihan Du · Longbo Huang · Wen Sun -
2023 Poster: Distributional Offline Policy Evaluation with Predictive Error Guarantees »
Runzhe Wu · Masatoshi Uehara · Wen Sun -
2023 Poster: Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings »
Masatoshi Uehara · Ayush Sekhari · Jason Lee · Nathan Kallus · Wen Sun -
2022 Poster: Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning approach »
Xuezhou Zhang · Yuda Song · Masatoshi Uehara · Mengdi Wang · Alekh Agarwal · Wen Sun -
2022 Poster: Learning Bellman Complete Representations for Offline Policy Evaluation »
Jonathan Chang · Kaiwen Wang · Nathan Kallus · Wen Sun -
2022 Poster: Optimal Estimation of Policy Gradient via Double Fitted Iteration »
Chengzhuo Ni · Ruiqi Zhang · Xiang Ji · Xuezhou Zhang · Mengdi Wang -
2022 Poster: Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory »
Ruiqi Zhang · Xuezhou Zhang · Chengzhuo Ni · Mengdi Wang -
2022 Spotlight: Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning approach »
Xuezhou Zhang · Yuda Song · Masatoshi Uehara · Mengdi Wang · Alekh Agarwal · Wen Sun -
2022 Spotlight: Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory »
Ruiqi Zhang · Xuezhou Zhang · Chengzhuo Ni · Mengdi Wang -
2022 Spotlight: Optimal Estimation of Policy Gradient via Double Fitted Iteration »
Chengzhuo Ni · Ruiqi Zhang · Xiang Ji · Xuezhou Zhang · Mengdi Wang -
2022 Oral: Learning Bellman Complete Representations for Offline Policy Evaluation »
Jonathan Chang · Kaiwen Wang · Nathan Kallus · Wen Sun -
2022 Poster: Out-of-Distribution Detection with Deep Nearest Neighbors »
Yiyou Sun · Yifei Ming · Jerry Zhu · Sharon Li -
2022 Spotlight: Out-of-Distribution Detection with Deep Nearest Neighbors »
Yiyou Sun · Yifei Ming · Jerry Zhu · Sharon Li -
2021 Poster: Fairness of Exposure in Stochastic Bandits »
Luke Lequn Wang · Yiwei Bai · Wen Sun · Thorsten Joachims -
2021 Spotlight: Fairness of Exposure in Stochastic Bandits »
Luke Lequn Wang · Yiwei Bai · Wen Sun · Thorsten Joachims -
2021 Poster: Robust Policy Gradient against Strong Data Corruption »
Xuezhou Zhang · Yiding Chen · Jerry Zhu · Wen Sun -
2021 Spotlight: Robust Policy Gradient against Strong Data Corruption »
Xuezhou Zhang · Yiding Chen · Jerry Zhu · Wen Sun -
2021 Poster: Bilinear Classes: A Structural Framework for Provable Generalization in RL »
Simon Du · Sham Kakade · Jason Lee · Shachar Lovett · Gaurav Mahajan · Wen Sun · Ruosong Wang -
2021 Oral: Bilinear Classes: A Structural Framework for Provable Generalization in RL »
Simon Du · Sham Kakade · Jason Lee · Shachar Lovett · Gaurav Mahajan · Wen Sun · Ruosong Wang -
2021 Poster: PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration »
Yuda Song · Wen Sun -
2021 Spotlight: PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration »
Yuda Song · Wen Sun -
2020 Poster: Adaptive Reward-Poisoning Attacks against Reinforcement Learning »
Xuezhou Zhang · Yuzhe Ma · Adish Singla · Jerry Zhu -
2020 Poster: Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning »
Amin Rakhsha · Goran Radanovic · Rati Devidze · Jerry Zhu · Adish Singla -
2019 Poster: Teaching a black-box learner »
Sanjoy Dasgupta · Daniel Hsu · Stefanos Poulis · Jerry Zhu -
2019 Oral: Teaching a black-box learner »
Sanjoy Dasgupta · Daniel Hsu · Stefanos Poulis · Jerry Zhu