Timezone: »
Gap-Dependent Unsupervised Exploration for Reinforcement Learning
Jingfeng Wu · Vladimir Braverman · Lin Yang
For the problem of task-agnostic reinforcement learning (RL), an agent first collects samples from an unknown environment without the supervision of reward signals, then is revealed with a reward and is asked to compute a corresponding near-optimal policy. Existing approaches mainly concern the worst-case scenarios, in which no structural information of the reward/transition-dynamics is utilized. Therefore the best sample upper bound is $\propto\widetilde{\mathcal{O}}(1/\epsilon^2)$, where $\epsilon>0$ is the target accuracy of the obtained policy, and can be overly pessimistic. To tackle this issue, we provide an efficient algorithm that utilizes a gap parameter, $\rho>0$, to reduce the amount of exploration. In particular, for an unknown finite-horizon Markov decision process, the algorithm takes only $\widetilde{\mathcal{O}} (1/\epsilon \cdot (H^3SA / \rho + H^4 S^2 A) )$ episodes of exploration, and is able to obtain an $\epsilon$-optimal policy for a post-revealed reward with sub-optimality gap at least $\rho$, where $S$ is the number of states, $A$ is the number of actions, and $H$ is the length of the horizon, obtaining a nearly \emph{quadratic saving} in terms of $\epsilon$. We show that, information-theoretically, this bound is nearly tight for $\rho < \Theta(1/(HS))$ and $H>1$. We further show that $\propto\widetilde{\mathcal{O}}(1)$ sample bound is possible for $H=1$ (i.e., multi-armed bandit) or with a sampling simulator, establishing a stark separation between those settings and the RL setting.
Author Information
Jingfeng Wu (Johns Hopkins University)
Vladimir Braverman (Johns Hopkins University)
Lin Yang (UCLA)
More from the Same Authors
-
2021 : Adversarial Robustness of Streaming Algorithms through Importance Sampling »
Vladimir Braverman · Avinatan Hasidim · Yossi Matias · Mariano Schain · Sandeep Silwal · Samson Zhou -
2021 : Bi-directional Adaptive Communication for Heterogenous Distributed Learning »
Dmitrii Avdiukhin · Vladimir Braverman -
2021 : Online Sub-Sampling for Reinforcement Learning with General Function Approximation »
Dingwen Kong · Ruslan Salakhutdinov · Ruosong Wang · Lin Yang -
2021 : Solving Multi-Arm Bandit Using a Few Bits of Communication »
Osama Hanna · Lin Yang · Christina Fragouli -
2022 : Provably Correct SGD-based Exploration for Linear Bandit »
Jialin Dong · Lin Yang -
2022 : Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning »
Dingwen Kong · Lin Yang -
2022 : The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift »
Jingfeng Wu · Difan Zou · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2023 Poster: Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron »
Jingfeng Wu · Difan Zou · Zixiang Chen · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2023 Poster: Does Sparsity Help in Learning Misspecified Linear Bandits? »
Jialin Dong · Lin Yang -
2023 Poster: Provable Data Subset Selection For Efficient Neural Networks Training »
Morad Tukan · Samson Zhou · Alaa Maalouf · Daniela Rus · Vladimir Braverman · Dan Feldman -
2023 Poster: AutoCoreset: An Automatic Practical Coreset Construction Framework »
Alaa Maalouf · Morad Tukan · Vladimir Braverman · Daniela Rus -
2023 Poster: Distributed Contextual Linear Bandits with Minimax Optimal Communication Cost »
Sanae Amani · Tor Lattimore · Andras Gyorgy · Lin Yang -
2023 Poster: Horizon-free Learning for Markov Decision Processes and Games: Stochastically Bounded Rewards and Improved Bounds »
Shengshi Li · Lin Yang -
2023 Poster: Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling »
Yunfan Li · Yiran Wang · Yu Cheng · Lin Yang -
2022 Poster: On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning »
Weichao Mao · Lin Yang · Kaiqing Zhang · Tamer Basar -
2022 Spotlight: On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning »
Weichao Mao · Lin Yang · Kaiqing Zhang · Tamer Basar -
2022 Poster: Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression »
Jingfeng Wu · Difan Zou · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2022 Oral: Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression »
Jingfeng Wu · Difan Zou · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2021 : Solving Multi-Arm Bandit Using a Few Bits of Communication »
Osama Hanna · Lin Yang · Christina Fragouli -
2021 Workshop: Workshop on Reinforcement Learning Theory »
Shipra Agrawal · Simon Du · Niao He · Csaba Szepesvari · Lin Yang -
2021 Poster: Provably Correct Optimization and Exploration with Non-linear Policies »
Fei Feng · Wotao Yin · Alekh Agarwal · Lin Yang -
2021 Poster: Randomized Exploration in Reinforcement Learning with General Value Function Approximation »
Haque Ishfaq · Qiwen Cui · Viet Nguyen · Alex Ayoub · Zhuoran Yang · Zhaoran Wang · Doina Precup · Lin Yang -
2021 Spotlight: Provably Correct Optimization and Exploration with Non-linear Policies »
Fei Feng · Wotao Yin · Alekh Agarwal · Lin Yang -
2021 Spotlight: Randomized Exploration in Reinforcement Learning with General Value Function Approximation »
Haque Ishfaq · Qiwen Cui · Viet Nguyen · Alex Ayoub · Zhuoran Yang · Zhaoran Wang · Doina Precup · Lin Yang -
2021 Poster: Safe Reinforcement Learning with Linear Function Approximation »
Sanae Amani · Christos Thrampoulidis · Lin Yang -
2021 Spotlight: Safe Reinforcement Learning with Linear Function Approximation »
Sanae Amani · Christos Thrampoulidis · Lin Yang -
2020 Poster: Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound »
Lin Yang · Mengdi Wang -
2020 Poster: Model-Based Reinforcement Learning with Value-Targeted Regression »
Alex Ayoub · Zeyu Jia · Csaba Szepesvari · Mengdi Wang · Lin Yang -
2020 Poster: Coresets for Clustering in Graphs of Bounded Treewidth »
Daniel Baker · Vladimir Braverman · Lingxiao Huang · Shaofeng H.-C. Jiang · Robert Krauthgamer · Xuan Wu -
2020 Poster: Schatten Norms in Matrix Streams: Hello Sparsity, Goodbye Dimension »
Vladimir Braverman · Robert Krauthgamer · Aditya Krishnan · Roi Sinoff -
2020 Poster: Nearly Linear Row Sampling Algorithm for Quantile Regression »
Yi Li · Ruosong Wang · Lin Yang · Hanrui Zhang -
2020 Poster: Obtaining Adjustable Regularization for Free via Iterate Averaging »
Jingfeng Wu · Vladimir Braverman · Lin Yang -
2020 Poster: On the Noisy Gradient Descent that Generalizes as SGD »
Jingfeng Wu · Wenqing Hu · Haoyi Xiong · Jun Huan · Vladimir Braverman · Zhanxing Zhu -
2020 Poster: FetchSGD: Communication-Efficient Federated Learning with Sketching »
Daniel Rothchild · Ashwinee Panda · Enayat Ullah · Nikita Ivkin · Ion Stoica · Vladimir Braverman · Joseph E Gonzalez · Raman Arora -
2019 Poster: Coresets for Ordered Weighted Clustering »
Vladimir Braverman · Shaofeng Jiang · Robert Krauthgamer · Xuan Wu -
2019 Oral: Coresets for Ordered Weighted Clustering »
Vladimir Braverman · Shaofeng Jiang · Robert Krauthgamer · Xuan Wu -
2019 Poster: The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects »
Zhanxing Zhu · Jingfeng Wu · Bing Yu · Lei Wu · Jinwen Ma -
2019 Oral: The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects »
Zhanxing Zhu · Jingfeng Wu · Bing Yu · Lei Wu · Jinwen Ma -
2018 Poster: Matrix Norms in Data Streams: Faster, Multi-Pass and Row-Order »
Vladimir Braverman · Stephen Chestnut · Robert Krauthgamer · Yi Li · David Woodruff · Lin Yang -
2018 Oral: Matrix Norms in Data Streams: Faster, Multi-Pass and Row-Order »
Vladimir Braverman · Stephen Chestnut · Robert Krauthgamer · Yi Li · David Woodruff · Lin Yang -
2017 Poster: Clustering High Dimensional Dynamic Data Streams »
Lin Yang · Harry Lang · Christian Sohler · Vladimir Braverman · Gereon Frahling -
2017 Talk: Clustering High Dimensional Dynamic Data Streams »
Lin Yang · Harry Lang · Christian Sohler · Vladimir Braverman · Gereon Frahling