Timezone: »
Poster
Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes
Runlong Zhou · Ruosong Wang · Simon Du
We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight. We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver. We prove an $\tilde{O}(\sqrt{\mathsf{Var}^\star M \Gamma S A K})$ regret bound where $\tilde{O}$ hides logarithm factors, $M$ is the number of contexts, $S$ is the number of states, $A$ is the number of actions, $K$ is the number of episodes, $\Gamma \le S$ is the maximum transition degree of any state-action pair, and $\mathsf{Var}^\star$ is a variance quantity describing the determinism of the LMDP. The regret bound only scales logarithmically with the planning horizon, thus yielding the first (nearly) horizon-free regret bound for LMDP. This is also the first problem-dependent regret bound for LMDP. Key in our proof is an analysis of the total variance of alpha vectors (a generalization of value functions), which is handled with a truncation method. We complement our positive result with a novel $\Omega(\sqrt{\mathsf{Var}^\star M S A K})$ regret lower bound with $\Gamma = 2$, which shows our upper bound minimax optimal when $\Gamma$ is a constant for the class of variance-bounded LMDPs. Our lower bound relies on new constructions of hard instances and an argument inspired by the symmetrization technique from theoretical computer science, both of which are technically different from existing lower bound proof for MDPs, and thus can be of independent interest.
Author Information
Runlong Zhou (CSE, University of Washington)
Ruosong Wang (Carnegie Mellon University)
Simon Du (University of Washington)
More from the Same Authors
-
2021 : Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret »
Jean Tarbouriech · Jean Tarbouriech · Simon Du · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2021 : Online Sub-Sampling for Reinforcement Learning with General Function Approximation »
Dingwen Kong · Ruslan Salakhutdinov · Ruosong Wang · Lin Yang -
2023 : Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation »
Qiwen Cui · Kaiqing Zhang · Simon Du -
2023 : Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single Neuron »
Weihang Xu · Simon Du -
2023 : Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer »
Yuandong Tian · Yiping Wang · Beidi Chen · Simon Du -
2023 : Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation »
Qiwen Cui · Kaiqing Zhang · Simon Du -
2023 : LabelBench: A Comprehensive Framework for Benchmarking Label-Efficient Learning »
Jifan Zhang · Yifang Chen · Gregory Canal · Stephen Mussmann · Yinglun Zhu · Simon Du · Kevin Jamieson · Robert Nowak -
2023 : Contributed talks 2 »
Simon Du · Wei Huang · Yuandong Tian -
2023 Poster: On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness »
Haotian Ye · Xiaoyu Chen · Liwei Wang · Simon Du -
2023 Oral: On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness »
Haotian Ye · Xiaoyu Chen · Liwei Wang · Simon Du -
2023 Poster: Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments »
Runlong Zhou · Zhang Zihan · Simon Du -
2023 Poster: Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing »
Jikai Jin · Zhiyuan Li · Kaifeng Lyu · Simon Du · Jason Lee -
2023 Poster: Improved Active Multi-Task Representation Learning via Lasso »
Yiping Wang · Yifang Chen · Kevin Jamieson · Simon Du -
2022 Poster: First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach »
Andrew Wagenmaker · Yifang Chen · Max Simchowitz · Simon Du · Kevin Jamieson -
2022 Poster: Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes »
Andrew Wagenmaker · Yifang Chen · Max Simchowitz · Simon Du · Kevin Jamieson -
2022 Poster: Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest Path »
Haoyuan Cai · Tengyu Ma · Simon Du -
2022 Spotlight: Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest Path »
Haoyuan Cai · Tengyu Ma · Simon Du -
2022 Spotlight: Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes »
Andrew Wagenmaker · Yifang Chen · Max Simchowitz · Simon Du · Kevin Jamieson -
2022 Oral: First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach »
Andrew Wagenmaker · Yifang Chen · Max Simchowitz · Simon Du · Kevin Jamieson -
2022 Poster: Active Multi-Task Representation Learning »
Yifang Chen · Kevin Jamieson · Simon Du -
2022 Poster: Denoised MDPs: Learning World Models Better Than the World Itself »
Tongzhou Wang · Simon Du · Antonio Torralba · Phillip Isola · Amy Zhang · Yuandong Tian -
2022 Spotlight: Denoised MDPs: Learning World Models Better Than the World Itself »
Tongzhou Wang · Simon Du · Antonio Torralba · Phillip Isola · Amy Zhang · Yuandong Tian -
2022 Spotlight: Active Multi-Task Representation Learning »
Yifang Chen · Kevin Jamieson · Simon Du -
2022 Poster: Nearly Optimal Policy Optimization with Stable at Any Time Guarantee »
Tianhao Wu · Yunchang Yang · Han Zhong · Liwei Wang · Simon Du · Jiantao Jiao -
2022 Spotlight: Nearly Optimal Policy Optimization with Stable at Any Time Guarantee »
Tianhao Wu · Yunchang Yang · Han Zhong · Liwei Wang · Simon Du · Jiantao Jiao -
2021 Workshop: Workshop on Reinforcement Learning Theory »
Shipra Agrawal · Simon Du · Niao He · Csaba Szepesvari · Lin Yang -
2021 Poster: Improved Corruption Robust Algorithms for Episodic Reinforcement Learning »
Yifang Chen · Simon Du · Kevin Jamieson -
2021 Spotlight: Improved Corruption Robust Algorithms for Episodic Reinforcement Learning »
Yifang Chen · Simon Du · Kevin Jamieson -
2021 Poster: Near Optimal Reward-Free Reinforcement Learning »
Zhang Zihan · Simon Du · Xiangyang Ji -
2021 Poster: On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP »
Tianhao Wu · Yunchang Yang · Simon Du · Liwei Wang -
2021 Poster: Bilinear Classes: A Structural Framework for Provable Generalization in RL »
Simon Du · Sham Kakade · Jason Lee · Shachar Lovett · Gaurav Mahajan · Wen Sun · Ruosong Wang -
2021 Poster: Instabilities of Offline RL with Pre-Trained Neural Representation »
Ruosong Wang · Yifan Wu · Ruslan Salakhutdinov · Sham Kakade -
2021 Spotlight: Instabilities of Offline RL with Pre-Trained Neural Representation »
Ruosong Wang · Yifan Wu · Ruslan Salakhutdinov · Sham Kakade -
2021 Spotlight: On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP »
Tianhao Wu · Yunchang Yang · Simon Du · Liwei Wang -
2021 Oral: Bilinear Classes: A Structural Framework for Provable Generalization in RL »
Simon Du · Sham Kakade · Jason Lee · Shachar Lovett · Gaurav Mahajan · Wen Sun · Ruosong Wang -
2021 Oral: Near Optimal Reward-Free Reinforcement Learning »
Zhang Zihan · Simon Du · Xiangyang Ji -
2020 Poster: Nearly Linear Row Sampling Algorithm for Quantile Regression »
Yi Li · Ruosong Wang · Lin Yang · Hanrui Zhang -
2019 Poster: Dimensionality Reduction for Tukey Regression »
Kenneth Clarkson · Ruosong Wang · David Woodruff -
2019 Oral: Dimensionality Reduction for Tukey Regression »
Kenneth Clarkson · Ruosong Wang · David Woodruff -
2019 Poster: Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Ruosong Wang -
2019 Oral: Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Ruosong Wang