Timezone: »
Poster
Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation
Xiaoyu Chen · Han Zhong · Zhuoran Yang · Zhaoran Wang · Liwei Wang
We study human-in-the-loop reinforcement learning (RL) with trajectory preferences, where instead of receiving a numeric reward at each step, the RL agent only receives preferences over trajectory pairs from a human overseer. The goal of the RL agent is to learn the optimal policy which is most preferred by the human overseer. Despite the empirical success in various real-world applications, the theoretical understanding of preference-based RL (PbRL) is only limited to the tabular case. In this paper, we propose the first optimistic model-based algorithm for PbRL with general function approximation, which estimates the model using value-targeted regression and calculates the exploratory policies by solving an optimistic planning problem. We prove that our algorithm achieves the regret bound of $\tilde{O} (\operatorname{poly}(d H) \sqrt{K} )$, where $d$ is the complexity measure of the transition and preference model depending on the Eluder dimension and log-covering numbers, $H$ is the planning horizon, $K$ is the number of episodes, and $\tilde O(\cdot)$ omits logarithmic terms. Our lower bound indicates that our algorithm is near-optimal when specialized to the linear setting. Furthermore, we extend the PbRL problem by formulating a novel problem called RL with $n$-wise comparisons, and provide the first sample-efficient algorithm for this new setting. To the best of our knowledge, this is the first theoretical result for PbRL with (general) function approximation.
Author Information
Xiaoyu Chen (Peking University)
Han Zhong (Peking University)
Zhuoran Yang (Yale University)
Zhaoran Wang (Northwestern University)
Liwei Wang (Peking University)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation »
Tue. Jul 19th 03:45 -- 03:50 PM Room Hall F
More from the Same Authors
-
2021 : Optimistic Exploration with Backward Bootstrapped Bonus for Deep Reinforcement Learning »
Chenjia Bai · Lingxiao Wang · Lei Han · Jianye Hao · Animesh Garg · Peng Liu · Zhaoran Wang -
2021 : Is Pessimism Provably Efficient for Offline RL? »
Ying Jin · Zhuoran Yang · Zhaoran Wang -
2023 : Reinforcement learning with Human Feedback: Learning Dynamic Choices via Pessimism »
Zihao Li · Zhuoran Yang · Mengdi Wang -
2023 Poster: On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness »
Haotian Ye · Xiaoyu Chen · Liwei Wang · Simon Du -
2023 Poster: Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning »
Yulai Zhao · Zhuoran Yang · Zhaoran Wang · Jason Lee -
2023 Poster: Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments »
Yixuan Wang · Simon Zhan · Ruochen Jiao · Zhilu Wang · Wanxin Jin · Zhuoran Yang · Zhaoran Wang · Chao Huang · Qi Zhu -
2023 Poster: Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP »
Jiacheng Guo · Zihao Li · Huazheng Wang · Mengdi Wang · Zhuoran Yang · Xuezhou Zhang -
2023 Poster: Adaptive Barrier Smoothing for First-Order Policy Gradient with Contact Dynamics »
Shenao Zhang · Wanxin Jin · Zhaoran Wang -
2023 Poster: A Complete Expressiveness Hierarchy for Subgraph GNNs via Subgraph Weisfeiler-Lehman Tests »
Bohang Zhang · Guhao Feng · Yiheng Du · Di He · Liwei Wang -
2023 Oral: On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness »
Haotian Ye · Xiaoyu Chen · Liwei Wang · Simon Du -
2023 Poster: Learning to Incentivize Information Acquisition: Proper Scoring Rules Meet Principal-Agent Model »
Siyu Chen · Jibang Wu · Yifan Wu · Zhuoran Yang -
2023 Poster: Offline Meta Reinforcement Learning with In-Distribution Online Adaptation »
Jianhao Wang · Jin Zhang · Haozhe Jiang · Junyu Zhang · Liwei Wang · Chongjie Zhang -
2023 Poster: Achieving Hierarchy-Free Approximation for Bilevel Programs with Equilibrium Constraints »
Jiayang Li · Jing Yu · Boyi Liu · Yu Nie · Zhaoran Wang -
2022 Poster: A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games »
Wei Xiong · Han Zhong · Chengshuai Shi · Cong Shen · Tong Zhang -
2022 Poster: Learning from Demonstration: Provably Efficient Adversarial Policy Imitation with Linear Function Approximation »
ZHIHAN LIU · Yufeng Zhang · Zuyue Fu · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Provably Efficient Offline Reinforcement Learning for Partially Observable Markov Decision Processes »
Hongyi Guo · Qi Cai · Yufeng Zhang · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets »
Han Zhong · Wei Xiong · Jiyuan Tan · Liwei Wang · Tong Zhang · Zhaoran Wang · Zhuoran Yang -
2022 Spotlight: Provably Efficient Offline Reinforcement Learning for Partially Observable Markov Decision Processes »
Hongyi Guo · Qi Cai · Yufeng Zhang · Zhuoran Yang · Zhaoran Wang -
2022 Spotlight: Learning from Demonstration: Provably Efficient Adversarial Policy Imitation with Linear Function Approximation »
ZHIHAN LIU · Yufeng Zhang · Zuyue Fu · Zhuoran Yang · Zhaoran Wang -
2022 Spotlight: Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets »
Han Zhong · Wei Xiong · Jiyuan Tan · Liwei Wang · Tong Zhang · Zhaoran Wang · Zhuoran Yang -
2022 Spotlight: A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games »
Wei Xiong · Han Zhong · Chengshuai Shi · Cong Shen · Tong Zhang -
2022 Poster: Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency »
Qi Cai · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Adaptive Model Design for Markov Decision Process »
Siyu Chen · Donglin Yang · Jiayang Li · Senmiao Wang · Zhuoran Yang · Zhaoran Wang -
2022 Spotlight: Adaptive Model Design for Markov Decision Process »
Siyu Chen · Donglin Yang · Jiayang Li · Senmiao Wang · Zhuoran Yang · Zhaoran Wang -
2022 Spotlight: Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency »
Qi Cai · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Nearly Optimal Policy Optimization with Stable at Any Time Guarantee »
Tianhao Wu · Yunchang Yang · Han Zhong · Liwei Wang · Simon Du · Jiantao Jiao -
2022 Poster: Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning »
Boxiang Lyu · Zhaoran Wang · Mladen Kolar · Zhuoran Yang -
2022 Poster: Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning »
Shuang Qiu · Lingxiao Wang · Chenjia Bai · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Welfare Maximization in Competitive Equilibrium: Reinforcement Learning for Markov Exchange Economy »
ZHIHAN LIU · Lu Miao · Zhaoran Wang · Michael Jordan · Zhuoran Yang -
2022 Spotlight: Welfare Maximization in Competitive Equilibrium: Reinforcement Learning for Markov Exchange Economy »
ZHIHAN LIU · Lu Miao · Zhaoran Wang · Michael Jordan · Zhuoran Yang -
2022 Spotlight: Nearly Optimal Policy Optimization with Stable at Any Time Guarantee »
Tianhao Wu · Yunchang Yang · Han Zhong · Liwei Wang · Simon Du · Jiantao Jiao -
2022 Spotlight: Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning »
Boxiang Lyu · Zhaoran Wang · Mladen Kolar · Zhuoran Yang -
2022 Spotlight: Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning »
Shuang Qiu · Lingxiao Wang · Chenjia Bai · Zhuoran Yang · Zhaoran Wang -
2021 : Discussion Panel #1 »
Hang Su · Matthias Hein · Liwei Wang · Sven Gowal · Jan Hendrik Metzen · Henry Liu · Yisen Wang -
2021 : Invited Talk #1 »
Liwei Wang -
2021 Poster: Decentralized Single-Timescale Actor-Critic on Zero-Sum Two-Player Stochastic Games »
Hongyi Guo · Zuyue Fu · Zhuoran Yang · Zhaoran Wang -
2021 Poster: Towards Certifying L-infinity Robustness using Neural Networks with L-inf-dist Neurons »
Bohang Zhang · Tianle Cai · Zhou Lu · Di He · Liwei Wang -
2021 Spotlight: Towards Certifying L-infinity Robustness using Neural Networks with L-inf-dist Neurons »
Bohang Zhang · Tianle Cai · Zhou Lu · Di He · Liwei Wang -
2021 Spotlight: Decentralized Single-Timescale Actor-Critic on Zero-Sum Two-Player Stochastic Games »
Hongyi Guo · Zuyue Fu · Zhuoran Yang · Zhaoran Wang -
2021 Poster: Near-Optimal Representation Learning for Linear Bandits and Linear RL »
Jiachen Hu · Xiaoyu Chen · Chi Jin · Lihong Li · Liwei Wang -
2021 Poster: Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality »
Tengyu Xu · Zhuoran Yang · Zhaoran Wang · Yingbin LIANG -
2021 Poster: Randomized Exploration in Reinforcement Learning with General Value Function Approximation »
Haque Ishfaq · Qiwen Cui · Viet Nguyen · Alex Ayoub · Zhuoran Yang · Zhaoran Wang · Doina Precup · Lin Yang -
2021 Poster: Infinite-Dimensional Optimization for Zero-Sum Games via Variational Transport »
Lewis Liu · Yufeng Zhang · Zhuoran Yang · Reza Babanezhad · Zhaoran Wang -
2021 Spotlight: Infinite-Dimensional Optimization for Zero-Sum Games via Variational Transport »
Lewis Liu · Yufeng Zhang · Zhuoran Yang · Reza Babanezhad · Zhaoran Wang -
2021 Spotlight: Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality »
Tengyu Xu · Zhuoran Yang · Zhaoran Wang · Yingbin LIANG -
2021 Spotlight: Randomized Exploration in Reinforcement Learning with General Value Function Approximation »
Haque Ishfaq · Qiwen Cui · Viet Nguyen · Alex Ayoub · Zhuoran Yang · Zhaoran Wang · Doina Precup · Lin Yang -
2021 Spotlight: Near-Optimal Representation Learning for Linear Bandits and Linear RL »
Jiachen Hu · Xiaoyu Chen · Chi Jin · Lihong Li · Liwei Wang -
2021 Poster: On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP »
Tianhao Wu · Yunchang Yang · Simon Du · Liwei Wang -
2021 Poster: Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions »
Shuang Qiu · Xiaohan Wei · Jieping Ye · Zhaoran Wang · Zhuoran Yang -
2021 Poster: On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game »
Shuang Qiu · Jieping Ye · Zhaoran Wang · Zhuoran Yang -
2021 Poster: Principled Exploration via Optimistic Bootstrapping and Backward Induction »
Chenjia Bai · Lingxiao Wang · Lei Han · Jianye Hao · Animesh Garg · Peng Liu · Zhaoran Wang -
2021 Oral: On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game »
Shuang Qiu · Jieping Ye · Zhaoran Wang · Zhuoran Yang -
2021 Spotlight: On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP »
Tianhao Wu · Yunchang Yang · Simon Du · Liwei Wang -
2021 Spotlight: Principled Exploration via Optimistic Bootstrapping and Backward Induction »
Chenjia Bai · Lingxiao Wang · Lei Han · Jianye Hao · Animesh Garg · Peng Liu · Zhaoran Wang -
2021 Oral: Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions »
Shuang Qiu · Xiaohan Wei · Jieping Ye · Zhaoran Wang · Zhuoran Yang -
2021 Poster: Learning While Playing in Mean-Field Games: Convergence and Optimality »
Qiaomin Xie · Zhuoran Yang · Zhaoran Wang · Andreea Minca -
2021 Poster: Is Pessimism Provably Efficient for Offline RL? »
Ying Jin · Zhuoran Yang · Zhaoran Wang -
2021 Poster: GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training »
Tianle Cai · Shengjie Luo · Keyulu Xu · Di He · Tie-Yan Liu · Liwei Wang -
2021 Spotlight: Is Pessimism Provably Efficient for Offline RL? »
Ying Jin · Zhuoran Yang · Zhaoran Wang -
2021 Spotlight: GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training »
Tianle Cai · Shengjie Luo · Keyulu Xu · Di He · Tie-Yan Liu · Liwei Wang -
2021 Spotlight: Learning While Playing in Mean-Field Games: Convergence and Optimality »
Qiaomin Xie · Zhuoran Yang · Zhaoran Wang · Andreea Minca -
2021 Poster: Risk-Sensitive Reinforcement Learning with Function Approximation: A Debiasing Approach »
Yingjie Fei · Zhuoran Yang · Zhaoran Wang -
2021 Oral: Risk-Sensitive Reinforcement Learning with Function Approximation: A Debiasing Approach »
Yingjie Fei · Zhuoran Yang · Zhaoran Wang -
2020 Poster: On Layer Normalization in the Transformer Architecture »
Ruibin Xiong · Yunchang Yang · Di He · Kai Zheng · Shuxin Zheng · Chen Xing · Huishuai Zhang · Yanyan Lan · Liwei Wang · Tie-Yan Liu -
2020 Poster: Breaking the Curse of Many Agents: Provable Mean Embedding Q-Iteration for Mean-Field Reinforcement Learning »
Lingxiao Wang · Zhuoran Yang · Zhaoran Wang -
2020 Poster: (Locally) Differentially Private Combinatorial Semi-Bandits »
Xiaoyu Chen · Kai Zheng · Zixin Zhou · Yunchang Yang · Wei Chen · Liwei Wang -
2020 Poster: Generative Adversarial Imitation Learning with Neural Network Parameterization: Global Optimality and Convergence Rate »
Yufeng Zhang · Qi Cai · Zhuoran Yang · Zhaoran Wang -
2020 Poster: Provably Efficient Exploration in Policy Optimization »
Qi Cai · Zhuoran Yang · Chi Jin · Zhaoran Wang -
2020 Poster: On the Global Optimality of Model-Agnostic Meta-Learning »
Lingxiao Wang · Qi Cai · Zhuoran Yang · Zhaoran Wang -
2020 Poster: Semiparametric Nonlinear Bipartite Graph Representation Learning with Provable Guarantees »
Sen Na · Yuwei Luo · Zhuoran Yang · Zhaoran Wang · Mladen Kolar -
2019 Poster: Efficient Training of BERT by Progressively Stacking »
Linyuan Gong · Di He · Zhuohan Li · Tao Qin · Liwei Wang · Tie-Yan Liu -
2019 Oral: Efficient Training of BERT by Progressively Stacking »
Linyuan Gong · Di He · Zhuohan Li · Tao Qin · Liwei Wang · Tie-Yan Liu -
2019 Poster: On the statistical rate of nonlinear recovery in generative models with heavy-tailed data »
Xiaohan Wei · Zhuoran Yang · Zhaoran Wang -
2019 Oral: On the statistical rate of nonlinear recovery in generative models with heavy-tailed data »
Xiaohan Wei · Zhuoran Yang · Zhaoran Wang -
2019 Poster: Gradient Descent Finds Global Minima of Deep Neural Networks »
Simon Du · Jason Lee · Haochuan Li · Liwei Wang · Xiyu Zhai -
2019 Oral: Gradient Descent Finds Global Minima of Deep Neural Networks »
Simon Du · Jason Lee · Haochuan Li · Liwei Wang · Xiyu Zhai -
2018 Poster: The Edge Density Barrier: Computational-Statistical Tradeoffs in Combinatorial Inference »
Hao Lu · Yuan Cao · Junwei Lu · Han Liu · Zhaoran Wang -
2018 Oral: The Edge Density Barrier: Computational-Statistical Tradeoffs in Combinatorial Inference »
Hao Lu · Yuan Cao · Junwei Lu · Han Liu · Zhaoran Wang -
2018 Poster: Towards Binary-Valued Gates for Robust LSTM Training »
Zhuohan Li · Di He · Fei Tian · Wei Chen · Tao Qin · Liwei Wang · Tie-Yan Liu -
2018 Oral: Towards Binary-Valued Gates for Robust LSTM Training »
Zhuohan Li · Di He · Fei Tian · Wei Chen · Tao Qin · Liwei Wang · Tie-Yan Liu -
2018 Poster: Dropout Training, Data-dependent Regularization, and Generalization Bounds »
Wenlong Mou · Yuchen Zhou · Jun Gao · Liwei Wang -
2018 Oral: Dropout Training, Data-dependent Regularization, and Generalization Bounds »
Wenlong Mou · Yuchen Zhou · Jun Gao · Liwei Wang -
2017 Poster: Collect at Once, Use Effectively: Making Non-interactive Locally Private Learning Possible »
Kai Zheng · Wenlong Mou · Liwei Wang -
2017 Talk: Collect at Once, Use Effectively: Making Non-interactive Locally Private Learning Possible »
Kai Zheng · Wenlong Mou · Liwei Wang