Timezone: »
We study offline reinforcement learning (RL) for partially observable Markov decision processes (POMDPs) with possibly infinite state and observation spaces. Under the undercompleteness assumption, the optimal policy in such POMDPs are characterized by a class of finite-memory Bellman operators. In the offline setting, estimating these operators directly is challenging due to (i) the large observation space and (ii) insufficient coverage of the offline dataset. To tackle these challenges, we propose a novel algorithm that constructs confidence regions for these Bellman operators via offline estimation of their RKHS embeddings, and returns the final policy via pessimistic planning within the confidence regions. We prove that the proposed algorithm attains an (\epsilon)-optimal policy using an offline dataset containing (\tilde\cO(1 / \epsilon^2)) episodes, provided that the behavior policy has good coverage over the optimal trajectory. To our best knowledge, our algorithm is the first provably sample efficient offline algorithm for POMDPs without uniform coverage assumptions.
Author Information
Hongyi Guo (Northwestern University)
Qi Cai (Northwestern University)
Yufeng Zhang (Northwestern University)
Zhuoran Yang (Yale University)
Zhaoran Wang (Northwestern University)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: Provably Efficient Offline Reinforcement Learning for Partially Observable Markov Decision Processes »
Thu. Jul 21st 06:40 -- 06:45 PM Room Room 301 - 303
More from the Same Authors
-
2021 : Optimistic Exploration with Backward Bootstrapped Bonus for Deep Reinforcement Learning »
Chenjia Bai · Lingxiao Wang · Lei Han · Jianye Hao · Animesh Garg · Peng Liu · Zhaoran Wang -
2021 : Is Pessimism Provably Efficient for Offline RL? »
Ying Jin · Zhuoran Yang · Zhaoran Wang -
2023 : Reinforcement learning with Human Feedback: Learning Dynamic Choices via Pessimism »
Zihao Li · Zhuoran Yang · Mengdi Wang -
2023 Poster: Behavior Contrastive Learning for Unsupervised Skill Discovery »
Rushuai Yang · Chenjia Bai · Hongyi Guo · Siyuan Li · Bin Zhao · Zhen Wang · Peng Liu · Xuelong Li -
2023 Poster: Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning »
Yulai Zhao · Zhuoran Yang · Zhaoran Wang · Jason Lee -
2023 Poster: Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments »
Yixuan Wang · Simon Zhan · Ruochen Jiao · Zhilu Wang · Wanxin Jin · Zhuoran Yang · Zhaoran Wang · Chao Huang · Qi Zhu -
2023 Poster: Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP »
Jiacheng Guo · Zihao Li · Huazheng Wang · Mengdi Wang · Zhuoran Yang · Xuezhou Zhang -
2023 Poster: Adaptive Barrier Smoothing for First-Order Policy Gradient with Contact Dynamics »
Shenao Zhang · Wanxin Jin · Zhaoran Wang -
2023 Poster: Learning to Incentivize Information Acquisition: Proper Scoring Rules Meet Principal-Agent Model »
Siyu Chen · Jibang Wu · Yifan Wu · Zhuoran Yang -
2023 Poster: Achieving Hierarchy-Free Approximation for Bilevel Programs with Equilibrium Constraints »
Jiayang Li · Jing Yu · Boyi Liu · Yu Nie · Zhaoran Wang -
2022 Poster: Learning from Demonstration: Provably Efficient Adversarial Policy Imitation with Linear Function Approximation »
ZHIHAN LIU · Yufeng Zhang · Zuyue Fu · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets »
Han Zhong · Wei Xiong · Jiyuan Tan · Liwei Wang · Tong Zhang · Zhaoran Wang · Zhuoran Yang -
2022 Spotlight: Learning from Demonstration: Provably Efficient Adversarial Policy Imitation with Linear Function Approximation »
ZHIHAN LIU · Yufeng Zhang · Zuyue Fu · Zhuoran Yang · Zhaoran Wang -
2022 Spotlight: Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets »
Han Zhong · Wei Xiong · Jiyuan Tan · Liwei Wang · Tong Zhang · Zhaoran Wang · Zhuoran Yang -
2022 Poster: Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency »
Qi Cai · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Adaptive Model Design for Markov Decision Process »
Siyu Chen · Donglin Yang · Jiayang Li · Senmiao Wang · Zhuoran Yang · Zhaoran Wang -
2022 Spotlight: Adaptive Model Design for Markov Decision Process »
Siyu Chen · Donglin Yang · Jiayang Li · Senmiao Wang · Zhuoran Yang · Zhaoran Wang -
2022 Spotlight: Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency »
Qi Cai · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning »
Boxiang Lyu · Zhaoran Wang · Mladen Kolar · Zhuoran Yang -
2022 Poster: Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning »
Shuang Qiu · Lingxiao Wang · Chenjia Bai · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Welfare Maximization in Competitive Equilibrium: Reinforcement Learning for Markov Exchange Economy »
ZHIHAN LIU · Lu Miao · Zhaoran Wang · Michael Jordan · Zhuoran Yang -
2022 Poster: Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation »
Xiaoyu Chen · Han Zhong · Zhuoran Yang · Zhaoran Wang · Liwei Wang -
2022 Spotlight: Welfare Maximization in Competitive Equilibrium: Reinforcement Learning for Markov Exchange Economy »
ZHIHAN LIU · Lu Miao · Zhaoran Wang · Michael Jordan · Zhuoran Yang -
2022 Spotlight: Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning »
Boxiang Lyu · Zhaoran Wang · Mladen Kolar · Zhuoran Yang -
2022 Spotlight: Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation »
Xiaoyu Chen · Han Zhong · Zhuoran Yang · Zhaoran Wang · Liwei Wang -
2022 Spotlight: Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning »
Shuang Qiu · Lingxiao Wang · Chenjia Bai · Zhuoran Yang · Zhaoran Wang -
2021 Poster: Decentralized Single-Timescale Actor-Critic on Zero-Sum Two-Player Stochastic Games »
Hongyi Guo · Zuyue Fu · Zhuoran Yang · Zhaoran Wang -
2021 Spotlight: Decentralized Single-Timescale Actor-Critic on Zero-Sum Two-Player Stochastic Games »
Hongyi Guo · Zuyue Fu · Zhuoran Yang · Zhaoran Wang -
2021 Poster: Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality »
Tengyu Xu · Zhuoran Yang · Zhaoran Wang · Yingbin LIANG -
2021 Poster: Randomized Exploration in Reinforcement Learning with General Value Function Approximation »
Haque Ishfaq · Qiwen Cui · Viet Nguyen · Alex Ayoub · Zhuoran Yang · Zhaoran Wang · Doina Precup · Lin Yang -
2021 Poster: Infinite-Dimensional Optimization for Zero-Sum Games via Variational Transport »
Lewis Liu · Yufeng Zhang · Zhuoran Yang · Reza Babanezhad · Zhaoran Wang -
2021 Spotlight: Infinite-Dimensional Optimization for Zero-Sum Games via Variational Transport »
Lewis Liu · Yufeng Zhang · Zhuoran Yang · Reza Babanezhad · Zhaoran Wang -
2021 Spotlight: Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality »
Tengyu Xu · Zhuoran Yang · Zhaoran Wang · Yingbin LIANG -
2021 Spotlight: Randomized Exploration in Reinforcement Learning with General Value Function Approximation »
Haque Ishfaq · Qiwen Cui · Viet Nguyen · Alex Ayoub · Zhuoran Yang · Zhaoran Wang · Doina Precup · Lin Yang -
2021 Poster: Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions »
Shuang Qiu · Xiaohan Wei · Jieping Ye · Zhaoran Wang · Zhuoran Yang -
2021 Poster: On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game »
Shuang Qiu · Jieping Ye · Zhaoran Wang · Zhuoran Yang -
2021 Poster: Principled Exploration via Optimistic Bootstrapping and Backward Induction »
Chenjia Bai · Lingxiao Wang · Lei Han · Jianye Hao · Animesh Garg · Peng Liu · Zhaoran Wang -
2021 Oral: On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game »
Shuang Qiu · Jieping Ye · Zhaoran Wang · Zhuoran Yang -
2021 Spotlight: Principled Exploration via Optimistic Bootstrapping and Backward Induction »
Chenjia Bai · Lingxiao Wang · Lei Han · Jianye Hao · Animesh Garg · Peng Liu · Zhaoran Wang -
2021 Oral: Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions »
Shuang Qiu · Xiaohan Wei · Jieping Ye · Zhaoran Wang · Zhuoran Yang -
2021 Poster: Learning While Playing in Mean-Field Games: Convergence and Optimality »
Qiaomin Xie · Zhuoran Yang · Zhaoran Wang · Andreea Minca -
2021 Poster: Is Pessimism Provably Efficient for Offline RL? »
Ying Jin · Zhuoran Yang · Zhaoran Wang -
2021 Spotlight: Is Pessimism Provably Efficient for Offline RL? »
Ying Jin · Zhuoran Yang · Zhaoran Wang -
2021 Spotlight: Learning While Playing in Mean-Field Games: Convergence and Optimality »
Qiaomin Xie · Zhuoran Yang · Zhaoran Wang · Andreea Minca -
2021 Poster: Risk-Sensitive Reinforcement Learning with Function Approximation: A Debiasing Approach »
Yingjie Fei · Zhuoran Yang · Zhaoran Wang -
2021 Oral: Risk-Sensitive Reinforcement Learning with Function Approximation: A Debiasing Approach »
Yingjie Fei · Zhuoran Yang · Zhaoran Wang -
2020 Poster: Breaking the Curse of Many Agents: Provable Mean Embedding Q-Iteration for Mean-Field Reinforcement Learning »
Lingxiao Wang · Zhuoran Yang · Zhaoran Wang -
2020 Poster: Generative Adversarial Imitation Learning with Neural Network Parameterization: Global Optimality and Convergence Rate »
Yufeng Zhang · Qi Cai · Zhuoran Yang · Zhaoran Wang -
2020 Poster: Provably Efficient Exploration in Policy Optimization »
Qi Cai · Zhuoran Yang · Chi Jin · Zhaoran Wang -
2020 Poster: On the Global Optimality of Model-Agnostic Meta-Learning »
Lingxiao Wang · Qi Cai · Zhuoran Yang · Zhaoran Wang -
2020 Poster: Semiparametric Nonlinear Bipartite Graph Representation Learning with Provable Guarantees »
Sen Na · Yuwei Luo · Zhuoran Yang · Zhaoran Wang · Mladen Kolar -
2019 Poster: On the statistical rate of nonlinear recovery in generative models with heavy-tailed data »
Xiaohan Wei · Zhuoran Yang · Zhaoran Wang -
2019 Oral: On the statistical rate of nonlinear recovery in generative models with heavy-tailed data »
Xiaohan Wei · Zhuoran Yang · Zhaoran Wang -
2018 Poster: The Edge Density Barrier: Computational-Statistical Tradeoffs in Combinatorial Inference »
Hao Lu · Yuan Cao · Junwei Lu · Han Liu · Zhaoran Wang -
2018 Oral: The Edge Density Barrier: Computational-Statistical Tradeoffs in Combinatorial Inference »
Hao Lu · Yuan Cao · Junwei Lu · Han Liu · Zhaoran Wang