Timezone: »
Oral
First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach
Andrew Wagenmaker · Yifang Chen · Max Simchowitz · Simon Du · Kevin Jamieson
Thu Jul 21 07:30 AM -- 07:50 AM (PDT) @ Ballroom 3 & 4
Obtaining first-order regret bounds---regret bounds scaling not as the worst-case but with some measure of the performance of the optimal policy on a given instance---is a core question in sequential decision-making. While such bounds exist in many settings, they have proven elusive in reinforcement learning with large state spaces. In this work we address this gap, and show that it is possible to obtain regret scaling as $\widetilde{\mathcal{O}}(\sqrt{d^3 H^3 \cdot V_1^\star \cdot K} + d^{3.5}H^3\log K )$ in reinforcement learning with large state spaces, namely the linear MDP setting. Here $V_1^\star$ is the value of the optimal policy and $K$ is the number of episodes. We demonstrate that existing techniques based on least squares estimation are insufficient to obtain this result, and instead develop a novel robust self-normalized concentration bound based on the robust Catoni mean estimator, which may be of independent interest.
Author Information
Andrew Wagenmaker (University of Washington)
Yifang Chen (University of Washington)
Max Simchowitz (MIT)
Simon Du (University of Washington)
Kevin Jamieson (University of Washington)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach »
Thu. Jul 21st through Fri the 22nd Room Hall E #1124
More from the Same Authors
-
2021 : Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret »
Jean Tarbouriech · Jean Tarbouriech · Simon Du · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2023 : Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation »
Qiwen Cui · Kaiqing Zhang · Simon Du -
2023 : Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single Neuron »
Weihang Xu · Simon Du -
2023 : Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer »
Yuandong Tian · Yiping Wang · Beidi Chen · Simon Du -
2023 : Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation »
Qiwen Cui · Kaiqing Zhang · Simon Du -
2023 : On the Imitation of Non-Markovian Demonstrations: From Low-Level Stability to High-Level Planning »
Adam Block · Daniel Pfrommer · Max Simchowitz -
2023 : LabelBench: A Comprehensive Framework for Benchmarking Label-Efficient Learning »
Jifan Zhang · Yifang Chen · Gregory Canal · Stephen Mussmann · Yinglun Zhu · Simon Du · Kevin Jamieson · Robert Nowak -
2023 : Contributed talks 2 »
Simon Du · Wei Huang · Yuandong Tian -
2023 Poster: On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness »
Haotian Ye · Xiaoyu Chen · Liwei Wang · Simon Du -
2023 Poster: The Power of Learned Locally Linear Models for Nonlinear Policy Optimization »
Daniel Pfrommer · Max Simchowitz · Tyler Westenbroek · Nikolai Matni · Stephen Tu -
2023 Poster: Statistical Learning under Heterogenous Distribution Shift »
Max Simchowitz · Anurag Ajay · Pulkit Agrawal · Akshay Krishnamurthy -
2023 Poster: Leveraging Offline Data in Online Reinforcement Learning »
Andrew Wagenmaker · Aldo Pacchiano -
2023 Oral: On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness »
Haotian Ye · Xiaoyu Chen · Liwei Wang · Simon Du -
2023 Poster: Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments »
Runlong Zhou · Zhang Zihan · Simon Du -
2023 Poster: Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing »
Jikai Jin · Zhiyuan Li · Kaifeng Lyu · Simon Du · Jason Lee -
2023 Poster: Improved Active Multi-Task Representation Learning via Lasso »
Yiping Wang · Yifang Chen · Kevin Jamieson · Simon Du -
2023 Poster: Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes »
Runlong Zhou · Ruosong Wang · Simon Du -
2022 Poster: Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes »
Andrew Wagenmaker · Yifang Chen · Max Simchowitz · Simon Du · Kevin Jamieson -
2022 Poster: Do Differentiable Simulators Give Better Policy Gradients? »
Hyung Ju Suh · Max Simchowitz · Kaiqing Zhang · Russ Tedrake -
2022 Poster: Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest Path »
Haoyuan Cai · Tengyu Ma · Simon Du -
2022 Oral: Do Differentiable Simulators Give Better Policy Gradients? »
Hyung Ju Suh · Max Simchowitz · Kaiqing Zhang · Russ Tedrake -
2022 Spotlight: Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest Path »
Haoyuan Cai · Tengyu Ma · Simon Du -
2022 Spotlight: Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes »
Andrew Wagenmaker · Yifang Chen · Max Simchowitz · Simon Du · Kevin Jamieson -
2022 Poster: Active Multi-Task Representation Learning »
Yifang Chen · Kevin Jamieson · Simon Du -
2022 Poster: Denoised MDPs: Learning World Models Better Than the World Itself »
Tongzhou Wang · Simon Du · Antonio Torralba · Phillip Isola · Amy Zhang · Yuandong Tian -
2022 Spotlight: Denoised MDPs: Learning World Models Better Than the World Itself »
Tongzhou Wang · Simon Du · Antonio Torralba · Phillip Isola · Amy Zhang · Yuandong Tian -
2022 Spotlight: Active Multi-Task Representation Learning »
Yifang Chen · Kevin Jamieson · Simon Du -
2022 Poster: Nearly Optimal Policy Optimization with Stable at Any Time Guarantee »
Tianhao Wu · Yunchang Yang · Han Zhong · Liwei Wang · Simon Du · Jiantao Jiao -
2022 Spotlight: Nearly Optimal Policy Optimization with Stable at Any Time Guarantee »
Tianhao Wu · Yunchang Yang · Han Zhong · Liwei Wang · Simon Du · Jiantao Jiao -
2021 Workshop: Workshop on Reinforcement Learning Theory »
Shipra Agrawal · Simon Du · Niao He · Csaba Szepesvari · Lin Yang -
2021 Poster: Improved Algorithms for Agnostic Pool-based Active Classification »
Julian Katz-Samuels · Jifan Zhang · Lalit Jain · Kevin Jamieson -
2021 Spotlight: Improved Algorithms for Agnostic Pool-based Active Classification »
Julian Katz-Samuels · Jifan Zhang · Lalit Jain · Kevin Jamieson -
2021 Poster: Improved Corruption Robust Algorithms for Episodic Reinforcement Learning »
Yifang Chen · Simon Du · Kevin Jamieson -
2021 Poster: Task-Optimal Exploration in Linear Dynamical Systems »
Andrew Wagenmaker · Max Simchowitz · Kevin Jamieson -
2021 Oral: Task-Optimal Exploration in Linear Dynamical Systems »
Andrew Wagenmaker · Max Simchowitz · Kevin Jamieson -
2021 Spotlight: Improved Corruption Robust Algorithms for Episodic Reinforcement Learning »
Yifang Chen · Simon Du · Kevin Jamieson -
2021 Poster: Near Optimal Reward-Free Reinforcement Learning »
Zhang Zihan · Simon Du · Xiangyang Ji -
2021 Poster: On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP »
Tianhao Wu · Yunchang Yang · Simon Du · Liwei Wang -
2021 Poster: Bilinear Classes: A Structural Framework for Provable Generalization in RL »
Simon Du · Sham Kakade · Jason Lee · Shachar Lovett · Gaurav Mahajan · Wen Sun · Ruosong Wang -
2021 Poster: High-dimensional Experimental Design and Kernel Bandits »
Romain Camilleri · Kevin Jamieson · Julian Katz-Samuels -
2021 Oral: High-dimensional Experimental Design and Kernel Bandits »
Romain Camilleri · Kevin Jamieson · Julian Katz-Samuels -
2021 Spotlight: On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP »
Tianhao Wu · Yunchang Yang · Simon Du · Liwei Wang -
2021 Oral: Bilinear Classes: A Structural Framework for Provable Generalization in RL »
Simon Du · Sham Kakade · Jason Lee · Shachar Lovett · Gaurav Mahajan · Wen Sun · Ruosong Wang -
2021 Oral: Near Optimal Reward-Free Reinforcement Learning »
Zhang Zihan · Simon Du · Xiangyang Ji -
2020 Poster: Estimating the Number and Effect Sizes of Non-null Hypotheses »
Jennifer Brennan · Ramya Korlakai Vinayak · Kevin Jamieson -
2018 Poster: Firing Bandits: Optimizing Crowdfunding »
Lalit Jain · Kevin Jamieson -
2018 Oral: Firing Bandits: Optimizing Crowdfunding »
Lalit Jain · Kevin Jamieson