Timezone: »
Poster
Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP
Jiacheng Guo · Zihao Li · Huazheng Wang · Mengdi Wang · Zhuoran Yang · Xuezhou Zhang
In this paper, we study representation learning in partially observable Markov Decision Processes (POMDPs), where the agent learns a decoder function that maps a series of high-dimensional raw observations to a compact representation and uses it for more efficient exploration and planning. We focus our attention on the sub-classes of *$\gamma$-observable* and *decodable POMDPs*, for which it has been shown that statistically tractable learning is possible, but there has not been any computationally efficient algorithm. We first present an algorithm for decodable PMMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU) to perform representation learning and achieve efficient sample complexity, while only calling supervised learning computational oracles. We then show how to adapt this algorithm to also work in the broader class of $\gamma$-observable POMDPs.
Author Information
Jiacheng Guo (Fudan University)
Zihao Li (Princeton University)
Huazheng Wang (Oregon State University)
Mengdi Wang (Princeton University)
Zhuoran Yang (Yale University)
Xuezhou Zhang (Princeton University)
More from the Same Authors
-
2023 : Efficient RL with Impaired Observability: Learning to Act with Delayed and Missing State Observations »
Minshuo Chen · Yu Bai · H. Vincent Poor · Mengdi Wang -
2023 : Scaling In-Context Demonstrations with Structured Attention »
Tianle Cai · Kaixuan Huang · Jason Lee · Mengdi Wang · Danqi Chen -
2023 : Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight »
Jiacheng Guo · Minshuo Chen · Huan Wang · Caiming Xiong · Mengdi Wang · Yu Bai -
2023 : Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism »
Zihao Li -
2023 : Principal-Driven Reward Design and Agent Policy Alignment via Bilevel-RL »
Souradip Chakraborty · Amrit Bedi · Alec Koppel · Furong Huang · Mengdi Wang -
2023 : Reinforcement learning with Human Feedback: Learning Dynamic Choices via Pessimism »
Zihao Li · Zhuoran Yang · Mengdi Wang -
2023 : Visual Adversarial Examples Jailbreak Aligned Large Language Models »
Xiangyu Qi · Kaixuan Huang · Ashwinee Panda · Mengdi Wang · Prateek Mittal -
2023 Poster: Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data »
Minshuo Chen · Kaixuan Huang · Tuo Zhao · Mengdi Wang -
2023 Poster: Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning »
Yulai Zhao · Zhuoran Yang · Zhaoran Wang · Jason Lee -
2023 Poster: STEERING : Stein Information Directed Exploration for Model-Based Reinforcement Learning »
Souradip Chakraborty · Amrit Bedi · Alec Koppel · Mengdi Wang · Furong Huang · Dinesh Manocha -
2023 Poster: Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments »
Yixuan Wang · Simon Zhan · Ruochen Jiao · Zhilu Wang · Wanxin Jin · Zhuoran Yang · Zhaoran Wang · Chao Huang · Qi Zhu -
2023 Poster: Effective Minkowski Dimension of Deep Nonparametric Regression: Function Approximation and Statistical Theories »
Zixuan Zhang · Minshuo Chen · Mengdi Wang · Wenjing Liao · Tuo Zhao -
2023 Poster: Learning to Incentivize Information Acquisition: Proper Scoring Rules Meet Principal-Agent Model »
Siyu Chen · Jibang Wu · Yifan Wu · Zhuoran Yang -
2022 : Policy Gradient: Theory for Making Best Use of It »
Mengdi Wang -
2022 Poster: Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning approach »
Xuezhou Zhang · Yuda Song · Masatoshi Uehara · Mengdi Wang · Alekh Agarwal · Wen Sun -
2022 Poster: When Are Linear Stochastic Bandits Attackable? »
Huazheng Wang · Haifeng Xu · Hongning Wang -
2022 Poster: Optimal Estimation of Policy Gradient via Double Fitted Iteration »
Chengzhuo Ni · Ruiqi Zhang · Xiang Ji · Xuezhou Zhang · Mengdi Wang -
2022 Poster: Learning from Demonstration: Provably Efficient Adversarial Policy Imitation with Linear Function Approximation »
ZHIHAN LIU · Yufeng Zhang · Zuyue Fu · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Provably Efficient Offline Reinforcement Learning for Partially Observable Markov Decision Processes »
Hongyi Guo · Qi Cai · Yufeng Zhang · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets »
Han Zhong · Wei Xiong · Jiyuan Tan · Liwei Wang · Tong Zhang · Zhaoran Wang · Zhuoran Yang -
2022 Poster: Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory »
Ruiqi Zhang · Xuezhou Zhang · Chengzhuo Ni · Mengdi Wang -
2022 Spotlight: Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning approach »
Xuezhou Zhang · Yuda Song · Masatoshi Uehara · Mengdi Wang · Alekh Agarwal · Wen Sun -
2022 Spotlight: When Are Linear Stochastic Bandits Attackable? »
Huazheng Wang · Haifeng Xu · Hongning Wang -
2022 Spotlight: Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory »
Ruiqi Zhang · Xuezhou Zhang · Chengzhuo Ni · Mengdi Wang -
2022 Spotlight: Provably Efficient Offline Reinforcement Learning for Partially Observable Markov Decision Processes »
Hongyi Guo · Qi Cai · Yufeng Zhang · Zhuoran Yang · Zhaoran Wang -
2022 Spotlight: Optimal Estimation of Policy Gradient via Double Fitted Iteration »
Chengzhuo Ni · Ruiqi Zhang · Xiang Ji · Xuezhou Zhang · Mengdi Wang -
2022 Spotlight: Learning from Demonstration: Provably Efficient Adversarial Policy Imitation with Linear Function Approximation »
ZHIHAN LIU · Yufeng Zhang · Zuyue Fu · Zhuoran Yang · Zhaoran Wang -
2022 Spotlight: Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets »
Han Zhong · Wei Xiong · Jiyuan Tan · Liwei Wang · Tong Zhang · Zhaoran Wang · Zhuoran Yang -
2022 Poster: Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency »
Qi Cai · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Adaptive Model Design for Markov Decision Process »
Siyu Chen · Donglin Yang · Jiayang Li · Senmiao Wang · Zhuoran Yang · Zhaoran Wang -
2022 Spotlight: Adaptive Model Design for Markov Decision Process »
Siyu Chen · Donglin Yang · Jiayang Li · Senmiao Wang · Zhuoran Yang · Zhaoran Wang -
2022 Spotlight: Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency »
Qi Cai · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning »
Boxiang Lyu · Zhaoran Wang · Mladen Kolar · Zhuoran Yang -
2022 Poster: Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning »
Shuang Qiu · Lingxiao Wang · Chenjia Bai · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Welfare Maximization in Competitive Equilibrium: Reinforcement Learning for Markov Exchange Economy »
ZHIHAN LIU · Lu Miao · Zhaoran Wang · Michael Jordan · Zhuoran Yang -
2022 Poster: Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation »
Xiaoyu Chen · Han Zhong · Zhuoran Yang · Zhaoran Wang · Liwei Wang -
2022 Spotlight: Welfare Maximization in Competitive Equilibrium: Reinforcement Learning for Markov Exchange Economy »
ZHIHAN LIU · Lu Miao · Zhaoran Wang · Michael Jordan · Zhuoran Yang -
2022 Spotlight: Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning »
Boxiang Lyu · Zhaoran Wang · Mladen Kolar · Zhuoran Yang -
2022 Spotlight: Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation »
Xiaoyu Chen · Han Zhong · Zhuoran Yang · Zhaoran Wang · Liwei Wang -
2022 Spotlight: Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning »
Shuang Qiu · Lingxiao Wang · Chenjia Bai · Zhuoran Yang · Zhaoran Wang -
2021 Poster: Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient »
Botao Hao · Yaqi Duan · Tor Lattimore · Csaba Szepesvari · Mengdi Wang -
2021 Spotlight: Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient »
Botao Hao · Yaqi Duan · Tor Lattimore · Csaba Szepesvari · Mengdi Wang -
2021 Poster: Bootstrapping Fitted Q-Evaluation for Off-Policy Inference »
Botao Hao · Xiang Ji · Yaqi Duan · Hao Lu · Csaba Szepesvari · Mengdi Wang -
2021 Spotlight: Bootstrapping Fitted Q-Evaluation for Off-Policy Inference »
Botao Hao · Xiang Ji · Yaqi Duan · Hao Lu · Csaba Szepesvari · Mengdi Wang -
2020 : QA for invited talk 7 Wang »
Mengdi Wang -
2020 : Invited talk 7 Wang »
Mengdi Wang -
2020 Workshop: Theoretical Foundations of Reinforcement Learning »
Emma Brunskill · Thodoris Lykouris · Max Simchowitz · Wen Sun · Mengdi Wang -
2020 Poster: Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound »
Lin Yang · Mengdi Wang -
2020 Poster: Model-Based Reinforcement Learning with Value-Targeted Regression »
Alex Ayoub · Zeyu Jia · Csaba Szepesvari · Mengdi Wang · Lin Yang -
2020 Poster: Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation »
Yaqi Duan · Zeyu Jia · Mengdi Wang -
2019 Poster: Sample-Optimal Parametric Q-Learning Using Linearly Additive Features »
Lin Yang · Mengdi Wang -
2019 Oral: Sample-Optimal Parametric Q-Learning Using Linearly Additive Features »
Lin Yang · Mengdi Wang -
2018 Poster: Estimation of Markov Chain via Rank-constrained Likelihood »
XUDONG LI · Mengdi Wang · Anru Zhang -
2018 Oral: Estimation of Markov Chain via Rank-constrained Likelihood »
XUDONG LI · Mengdi Wang · Anru Zhang -
2018 Poster: Scalable Bilinear Pi Learning Using State and Action Features »
Yichen Chen · Lihong Li · Mengdi Wang -
2018 Oral: Scalable Bilinear Pi Learning Using State and Action Features »
Yichen Chen · Lihong Li · Mengdi Wang -
2017 Poster: Strong NP-Hardness for Sparse Optimization with Concave Penalty Functions »
Yichen Chen · Dongdong Ge · Mengdi Wang · Zizhuo Wang · Yinyu Ye · Hao Yin -
2017 Talk: Strong NP-Hardness for Sparse Optimization with Concave Penalty Functions »
Yichen Chen · Dongdong Ge · Mengdi Wang · Zizhuo Wang · Yinyu Ye · Hao Yin