Timezone: »
Poster
Near Optimal Reward-Free Reinforcement Learning
Zhang Zihan · Simon Du · Xiangyang Ji
We study the reward-free reinforcement learning framework, which is particularly suitable for batch reinforcement learning and scenarios where one needs policies for multiple reward functions.
This framework has two phases: in the exploration phase, the agent collects trajectories by interacting with the environment without using any reward signal; in the planning phase, the agent needs to return a near-optimal policy for arbitrary reward functions.
%This framework is suitable for batch RL setting and the setting where there are multiple reward functions of interes
We give a new efficient algorithm, \textbf{S}taged \textbf{S}ampling + \textbf{T}runcated \textbf{P}lanning (\algoname), which interacts with the environment at most $O\left( \frac{S^2A}{\epsilon^2}\poly\log\left(\frac{SAH}{\epsilon}\right) \right)$ episodes in the exploration phase, and guarantees to output a near-optimal policy for arbitrary reward functions in the planning phase, where $S$ is the size of state space, $A$ is the size of action space, $H$ is the planning horizon, and $\epsilon$ is the target accuracy relative to the total reward.
Notably, our sample complexity scales only \emph{logarithmically} with $H$, in contrast to all existing results which scale \emph{polynomially} with $H$.
Furthermore, this bound matches the minimax lower bound $\Omega\left(\frac{S^2A}{\epsilon^2}\right)$ up to logarithmic factors.
Our results rely on three new techniques : 1) A new sufficient condition for the dataset to plan for an $\epsilon$-suboptimal policy
% for any totally bounded reward function
; 2) A new way to plan efficiently under the proposed condition using soft-truncated planning; 3) Constructing extended MDP to maximize the truncated accumulative rewards efficiently.
Author Information
Zhang Zihan (Tsinghua University)
Simon Du (University of Washington)
Xiangyang Ji (Tsinghua University)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Oral: Near Optimal Reward-Free Reinforcement Learning »
Wed. Jul 21st 12:00 -- 12:20 PM Room
More from the Same Authors
-
2021 : Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret »
Jean Tarbouriech · Jean Tarbouriech · Simon Du · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2023 : Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation »
Qiwen Cui · Kaiqing Zhang · Simon Du -
2023 : Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single Neuron »
Weihang Xu · Simon Du -
2023 : Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer »
Yuandong Tian · Yiping Wang · Beidi Chen · Simon Du -
2023 : Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation »
Qiwen Cui · Kaiqing Zhang · Simon Du -
2023 : LabelBench: A Comprehensive Framework for Benchmarking Label-Efficient Learning »
Jifan Zhang · Yifang Chen · Gregory Canal · Stephen Mussmann · Yinglun Zhu · Simon Du · Kevin Jamieson · Robert Nowak -
2023 : Contributed talks 2 »
Simon Du · Wei Huang · Yuandong Tian -
2023 Poster: On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness »
Haotian Ye · Xiaoyu Chen · Liwei Wang · Simon Du -
2023 Poster: Complementary Attention for Multi-Agent Reinforcement Learning »
Jianzhun Shao · Hongchang Zhang · Yun Qu · Chang Liu · Shuncheng He · Yuhang Jiang · Xiangyang Ji -
2023 Poster: Supported Trust Region Optimization for Offline Reinforcement Learning »
Yixiu Mao · Hongchang Zhang · Chen Chen · Yi Xu · Xiangyang Ji -
2023 Oral: On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness »
Haotian Ye · Xiaoyu Chen · Liwei Wang · Simon Du -
2023 Poster: Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments »
Runlong Zhou · Zhang Zihan · Simon Du -
2023 Poster: Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing »
Jikai Jin · Zhiyuan Li · Kaifeng Lyu · Simon Du · Jason Lee -
2023 Poster: No One Idles: Efficient Heterogeneous Federated Learning with Parallel Edge and Server Computation »
Feilong Zhang · Xianming Liu · Shiyi Lin · Gang Wu · Xiong Zhou · Junjun Jiang · Xiangyang Ji -
2023 Poster: Improved Active Multi-Task Representation Learning via Lasso »
Yiping Wang · Yifang Chen · Kevin Jamieson · Simon Du -
2023 Poster: Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes »
Runlong Zhou · Ruosong Wang · Simon Du -
2022 Poster: First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach »
Andrew Wagenmaker · Yifang Chen · Max Simchowitz · Simon Du · Kevin Jamieson -
2022 Poster: Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes »
Andrew Wagenmaker · Yifang Chen · Max Simchowitz · Simon Du · Kevin Jamieson -
2022 Poster: Prototype-Anchored Learning for Learning with Imperfect Annotations »
Xiong Zhou · Xianming Liu · Deming Zhai · Junjun Jiang · Xin Gao · Xiangyang Ji -
2022 Poster: Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest Path »
Haoyuan Cai · Tengyu Ma · Simon Du -
2022 Spotlight: Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest Path »
Haoyuan Cai · Tengyu Ma · Simon Du -
2022 Spotlight: Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes »
Andrew Wagenmaker · Yifang Chen · Max Simchowitz · Simon Du · Kevin Jamieson -
2022 Oral: First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach »
Andrew Wagenmaker · Yifang Chen · Max Simchowitz · Simon Du · Kevin Jamieson -
2022 Poster: Active Multi-Task Representation Learning »
Yifang Chen · Kevin Jamieson · Simon Du -
2022 Poster: Denoised MDPs: Learning World Models Better Than the World Itself »
Tongzhou Wang · Simon Du · Antonio Torralba · Phillip Isola · Amy Zhang · Yuandong Tian -
2022 Spotlight: Prototype-Anchored Learning for Learning with Imperfect Annotations »
Xiong Zhou · Xianming Liu · Deming Zhai · Junjun Jiang · Xin Gao · Xiangyang Ji -
2022 Spotlight: Denoised MDPs: Learning World Models Better Than the World Itself »
Tongzhou Wang · Simon Du · Antonio Torralba · Phillip Isola · Amy Zhang · Yuandong Tian -
2022 Spotlight: Active Multi-Task Representation Learning »
Yifang Chen · Kevin Jamieson · Simon Du -
2022 Poster: Nearly Optimal Policy Optimization with Stable at Any Time Guarantee »
Tianhao Wu · Yunchang Yang · Han Zhong · Liwei Wang · Simon Du · Jiantao Jiao -
2022 Spotlight: Nearly Optimal Policy Optimization with Stable at Any Time Guarantee »
Tianhao Wu · Yunchang Yang · Han Zhong · Liwei Wang · Simon Du · Jiantao Jiao -
2021 Workshop: Workshop on Reinforcement Learning Theory »
Shipra Agrawal · Simon Du · Niao He · Csaba Szepesvari · Lin Yang -
2021 Poster: Asymmetric Loss Functions for Learning with Noisy Labels »
Xiong Zhou · Xianming Liu · Junjun Jiang · Xin Gao · Xiangyang Ji -
2021 Poster: Improved Corruption Robust Algorithms for Episodic Reinforcement Learning »
Yifang Chen · Simon Du · Kevin Jamieson -
2021 Spotlight: Asymmetric Loss Functions for Learning with Noisy Labels »
Xiong Zhou · Xianming Liu · Junjun Jiang · Xin Gao · Xiangyang Ji -
2021 Spotlight: Improved Corruption Robust Algorithms for Episodic Reinforcement Learning »
Yifang Chen · Simon Du · Kevin Jamieson -
2021 Poster: On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP »
Tianhao Wu · Yunchang Yang · Simon Du · Liwei Wang -
2021 Poster: Bilinear Classes: A Structural Framework for Provable Generalization in RL »
Simon Du · Sham Kakade · Jason Lee · Shachar Lovett · Gaurav Mahajan · Wen Sun · Ruosong Wang -
2021 Spotlight: On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP »
Tianhao Wu · Yunchang Yang · Simon Du · Liwei Wang -
2021 Oral: Bilinear Classes: A Structural Framework for Provable Generalization in RL »
Simon Du · Sham Kakade · Jason Lee · Shachar Lovett · Gaurav Mahajan · Wen Sun · Ruosong Wang -
2021 Poster: Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity »
Zhang Zihan · Yuan Zhou · Xiangyang Ji -
2021 Spotlight: Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity »
Zhang Zihan · Yuan Zhou · Xiangyang Ji