Timezone: »
Poster
Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs
Junkai Zhang · Weitong Zhang · Quanquan Gu
We study reward-free reinforcement learning (RL) with linear function approximation, where the agent works in two phases: (1) in the exploration phase, the agent interacts with the environment but cannot access the reward; and (2) in the planning phase, the agent is given a reward function and is expected to find a near-optimal policy based on samples collected in the exploration phase. The sample complexities of existing reward-free algorithms have a polynomial dependence on the planning horizon, which makes them intractable for long planning horizon RL problems. In this paper, we propose a new reward-free algorithm for learning linear mixture Markov decision processes (MDPs), where the transition probability can be parameterized as a linear combination of known feature mappings. At the core of our algorithm is uncertainty-weighted value-targeted regression with exploration-driven pseudo-reward and a high-order moment estimator for the aleatoric and epistemic uncertainties. When the total reward is bounded by $1$, we show that our algorithm only needs to explore $\tilde O\left( d^2\varepsilon^{-2}\right)$ episodes to find an $\varepsilon$-optimal policy, where $d$ is the dimension of the feature mapping. The sample complexity of our algorithm only has a polylogarithmic dependence on the planning horizon and therefore is "horizon-free''. In addition, we provide an $\Omega\left(d^2\varepsilon^{-2}\right)$ sample complexity lower bound, which matches the sample complexity of our algorithm up to logarithmic factors, suggesting that our algorithm is optimal.
Author Information
Junkai Zhang (University of California, Los Angeles)
Weitong Zhang (University of California, Los Angeles)
Quanquan Gu (University of California, Los Angeles)
More from the Same Authors
-
2021 : Benign Overfitting in Adversarially Robust Linear Classification »
Jinghui Chen · Yuan Cao · Yuan Cao · Quanquan Gu -
2021 : Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures »
Yuan Cao · Yuan Cao · Quanquan Gu · Mikhail Belkin -
2021 : Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation »
Yue Wu · Dongruo Zhou · Quanquan Gu -
2021 : Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function Approximation »
Jiafan He · Dongruo Zhou · Quanquan Gu -
2021 : Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs »
Jiafan He · Dongruo Zhou · Quanquan Gu -
2021 : Almost Optimal Algorithms for Two-player Markov Games with Linear Function Approximation »
Zixiang Chen · Dongruo Zhou · Quanquan Gu -
2022 : The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift »
Jingfeng Wu · Difan Zou · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2023 : Robust Learning with Progressive Data Expansion Against Spurious Correlation »
Yihe Deng · Yu Yang · Baharan Mirzasoleiman · Quanquan Gu -
2023 : DiffMol: 3D Structured Molecule Generation with Discrete Denoising Diffusion Probabilistic Models »
Weitong Zhang · Xiaoyun Wang · Justin Smith · Joe Eaton · Brad Rees · Quanquan Gu -
2023 : Borda Regret Minimization for Generalized Linear Dueling Bandits »
Yue Wu · Tao Jin · Qiwei Di · Hao Lou · Farzad Farnoud · Quanquan Gu -
2023 Poster: DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug Design »
Jiaqi Guan · Xiangxin Zhou · Yuwei Yang · Yu Bao · Jian Peng · Jianzhu Ma · Qiang Liu · Liang Wang · Quanquan Gu -
2023 Poster: Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes »
Jiafan He · Heyang Zhao · Dongruo Zhou · Quanquan Gu -
2023 Poster: Cooperative Multi-Agent Reinforcement Learning: Asynchronous Communication and Linear Function Approximation »
Yifei Min · Jiafan He · Tianhao Wang · Quanquan Gu -
2023 Poster: Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron »
Jingfeng Wu · Difan Zou · Zixiang Chen · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2023 Poster: Benign Overfitting in Two-layer ReLU Convolutional Neural Networks »
Yiwen Kou · Zixiang Chen · Yuanzhou Chen · Quanquan Gu -
2023 Poster: Nesterov Meets Optimism: Rate-Optimal Separable Minimax Optimization »
Chris Junchi Li · Huizhuo Yuan · Gauthier Gidel · Quanquan Gu · Michael Jordan -
2023 Oral: Structure-informed Language Models Are Protein Designers »
Zaixiang Zheng · Yifan Deng · Dongyu Xue · Yi Zhou · Fei YE · Quanquan Gu -
2023 Poster: Personalized Federated Learning under Mixture of Distributions »
Yue Wu · Shuaicheng Zhang · Wenchao Yu · Yanchi Liu · Quanquan Gu · Dawei Zhou · Haifeng Chen · Wei Cheng -
2023 Poster: Optimal Online Generalized Linear Regression with Stochastic Noise and Its Application to Heteroscedastic Bandits »
Heyang Zhao · Dongruo Zhou · Jiafan He · Quanquan Gu -
2023 Poster: Structure-informed Language Models Are Protein Designers »
Zaixiang Zheng · Yifan Deng · Dongyu Xue · Yi Zhou · Fei YE · Quanquan Gu -
2023 Poster: The Benefits of Mixup for Feature Learning »
Difan Zou · Yuan Cao · Yuanzhi Li · Quanquan Gu -
2023 Poster: Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path »
Qiwei Di · Jiafan He · Dongruo Zhou · Quanquan Gu -
2023 Poster: On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits »
Weitong Zhang · Jiafan He · Zhiyuan Fan · Quanquan Gu -
2023 Poster: Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes »
Chenlu Ye · Wei Xiong · Quanquan Gu · Tong Zhang -
2022 Poster: Learning Stochastic Shortest Path with Linear Function Approximation »
Yifei Min · Jiafan He · Tianhao Wang · Quanquan Gu -
2022 Spotlight: Learning Stochastic Shortest Path with Linear Function Approximation »
Yifei Min · Jiafan He · Tianhao Wang · Quanquan Gu -
2022 Poster: Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression »
Jingfeng Wu · Difan Zou · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2022 Poster: On the Sample Complexity of Learning Infinite-horizon Discounted Linear Kernel MDPs »
Yuanzhou Chen · Jiafan He · Quanquan Gu -
2022 Oral: Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression »
Jingfeng Wu · Difan Zou · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2022 Spotlight: On the Sample Complexity of Learning Infinite-horizon Discounted Linear Kernel MDPs »
Yuanzhou Chen · Jiafan He · Quanquan Gu -
2022 Poster: Dimension-free Complexity Bounds for High-order Nonconvex Finite-sum Optimization »
Dongruo Zhou · Quanquan Gu -
2022 Spotlight: Dimension-free Complexity Bounds for High-order Nonconvex Finite-sum Optimization »
Dongruo Zhou · Quanquan Gu -
2021 : Stochastic Variance-Reduced High-order Optimization for Nonconvex Optimization »
Quanquan Gu -
2021 Workshop: Over-parameterization: Pitfalls and Opportunities »
Yasaman Bahri · Quanquan Gu · Amin Karbasi · Hanie Sedghi -
2021 Poster: On the Convergence of Hamiltonian Monte Carlo with Stochastic Gradients »
Difan Zou · Quanquan Gu -
2021 Spotlight: On the Convergence of Hamiltonian Monte Carlo with Stochastic Gradients »
Difan Zou · Quanquan Gu -
2021 Poster: Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits »
Tianyuan Jin · Jing Tang · Pan Xu · Keke Huang · Xiaokui Xiao · Quanquan Gu -
2021 Poster: MOTS: Minimax Optimal Thompson Sampling »
Tianyuan Jin · Pan Xu · Jieming Shi · Xiaokui Xiao · Quanquan Gu -
2021 Poster: Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping »
Dongruo Zhou · Jiafan He · Quanquan Gu -
2021 Poster: Logarithmic Regret for Reinforcement Learning with Linear Function Approximation »
Jiafan He · Dongruo Zhou · Quanquan Gu -
2021 Spotlight: Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits »
Tianyuan Jin · Jing Tang · Pan Xu · Keke Huang · Xiaokui Xiao · Quanquan Gu -
2021 Spotlight: Logarithmic Regret for Reinforcement Learning with Linear Function Approximation »
Jiafan He · Dongruo Zhou · Quanquan Gu -
2021 Spotlight: MOTS: Minimax Optimal Thompson Sampling »
Tianyuan Jin · Pan Xu · Jieming Shi · Xiaokui Xiao · Quanquan Gu -
2021 Spotlight: Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping »
Dongruo Zhou · Jiafan He · Quanquan Gu -
2021 Poster: Provable Robustness of Adversarial Training for Learning Halfspaces with Noise »
Difan Zou · Spencer Frei · Quanquan Gu -
2021 Poster: Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins »
Spencer Frei · Yuan Cao · Quanquan Gu -
2021 Poster: Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise »
Spencer Frei · Yuan Cao · Quanquan Gu -
2021 Spotlight: Provable Robustness of Adversarial Training for Learning Halfspaces with Noise »
Difan Zou · Spencer Frei · Quanquan Gu -
2021 Oral: Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins »
Spencer Frei · Yuan Cao · Quanquan Gu -
2021 Spotlight: Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise »
Spencer Frei · Yuan Cao · Quanquan Gu -
2020 Poster: A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation »
Pan Xu · Quanquan Gu -
2020 Poster: Optimization Theory for ReLU Neural Networks Trained with Normalization Layers »
Yonatan Dukler · Quanquan Gu · Guido Montufar -
2020 Poster: Neural Contextual Bandits with UCB-based Exploration »
Dongruo Zhou · Lihong Li · Quanquan Gu -
2019 Poster: On the Convergence and Robustness of Adversarial Training »
Yisen Wang · Xingjun Ma · James Bailey · Jinfeng Yi · Bowen Zhou · Quanquan Gu -
2019 Oral: On the Convergence and Robustness of Adversarial Training »
Yisen Wang · Xingjun Ma · James Bailey · Jinfeng Yi · Bowen Zhou · Quanquan Gu -
2019 Poster: Lower Bounds for Smooth Nonconvex Finite-Sum Optimization »
Dongruo Zhou · Quanquan Gu -
2019 Oral: Lower Bounds for Smooth Nonconvex Finite-Sum Optimization »
Dongruo Zhou · Quanquan Gu -
2018 Poster: Fast and Sample Efficient Inductive Matrix Completion via Multi-Phase Procrustes Flow »
Xiao Zhang · Simon Du · Quanquan Gu -
2018 Poster: Continuous and Discrete-time Accelerated Stochastic Mirror Descent for Strongly Convex Functions »
Pan Xu · Tianhao Wang · Quanquan Gu -
2018 Oral: Fast and Sample Efficient Inductive Matrix Completion via Multi-Phase Procrustes Flow »
Xiao Zhang · Simon Du · Quanquan Gu -
2018 Oral: Continuous and Discrete-time Accelerated Stochastic Mirror Descent for Strongly Convex Functions »
Pan Xu · Tianhao Wang · Quanquan Gu -
2018 Poster: A Primal-Dual Analysis of Global Optimality in Nonconvex Low-Rank Matrix Recovery »
Xiao Zhang · Lingxiao Wang · Yaodong Yu · Quanquan Gu -
2018 Poster: Stochastic Variance-Reduced Hamilton Monte Carlo Methods »
Difan Zou · Pan Xu · Quanquan Gu -
2018 Oral: Stochastic Variance-Reduced Hamilton Monte Carlo Methods »
Difan Zou · Pan Xu · Quanquan Gu -
2018 Oral: A Primal-Dual Analysis of Global Optimality in Nonconvex Low-Rank Matrix Recovery »
Xiao Zhang · Lingxiao Wang · Yaodong Yu · Quanquan Gu -
2018 Poster: Stochastic Variance-Reduced Cubic Regularized Newton Method »
Dongruo Zhou · Pan Xu · Quanquan Gu -
2018 Poster: Covariate Adjusted Precision Matrix Estimation via Nonconvex Optimization »
Jinghui Chen · Pan Xu · Lingxiao Wang · Jian Ma · Quanquan Gu -
2018 Oral: Stochastic Variance-Reduced Cubic Regularized Newton Method »
Dongruo Zhou · Pan Xu · Quanquan Gu -
2018 Oral: Covariate Adjusted Precision Matrix Estimation via Nonconvex Optimization »
Jinghui Chen · Pan Xu · Lingxiao Wang · Jian Ma · Quanquan Gu -
2017 Poster: Uncertainty Assessment and False Discovery Rate Control in High-Dimensional Granger Causal Inference »
Aditya Chaudhry · Pan Xu · Quanquan Gu -
2017 Poster: High-Dimensional Variance-Reduced Stochastic Gradient Expectation-Maximization Algorithm »
Rongda Zhu · Lingxiao Wang · Chengxiang Zhai · Quanquan Gu -
2017 Poster: Robust Gaussian Graphical Model Estimation with Arbitrary Corruption »
Lingxiao Wang · Quanquan Gu -
2017 Talk: High-Dimensional Variance-Reduced Stochastic Gradient Expectation-Maximization Algorithm »
Rongda Zhu · Lingxiao Wang · Chengxiang Zhai · Quanquan Gu -
2017 Talk: Robust Gaussian Graphical Model Estimation with Arbitrary Corruption »
Lingxiao Wang · Quanquan Gu -
2017 Talk: Uncertainty Assessment and False Discovery Rate Control in High-Dimensional Granger Causal Inference »
Aditya Chaudhry · Pan Xu · Quanquan Gu -
2017 Poster: A Unified Variance Reduction-Based Framework for Nonconvex Low-Rank Matrix Recovery »
Lingxiao Wang · Xiao Zhang · Quanquan Gu -
2017 Talk: A Unified Variance Reduction-Based Framework for Nonconvex Low-Rank Matrix Recovery »
Lingxiao Wang · Xiao Zhang · Quanquan Gu