Timezone: »
When is Agnostic Reinforcement Learning Statistically Tractable?
Gene Li · Zeyu Jia · Alexander Rakhlin · Ayush Sekhari · Nati Srebro
Event URL: https://openreview.net/forum?id=WZE4c4hPgH »
We study the problem of agnostic PAC reinforcement learning (RL): given a policy class $\Pi$, how many rounds of interaction with an unknown MDP (with a potentially large state and action space) are required to learn an $\epsilon$-suboptimal policy with respect to (\Pi)? Towards that end, we introduce a new complexity measure, called the spanning capacity, that depends solely on the set (\Pi) and is independent of the MDP dynamics. With a generative model, we show that the spanning capacity characterizes PAC learnability for every policy class $\Pi$. However, for online RL, the situation is more subtle. We show there exists a policy class $\Pi$ with a bounded spanning capacity that requires a superpolynomial number of samples to learn. This reveals a surprising separation for agnostic learnability between generative access and online access models (as well as between deterministic/stochastic MDPs under online access). On the positive side, we identify an additional sunflower structure which in conjunction with bounded spanning capacity enables statistically efficient online RL via a new algorithm called POPLER, which takes inspiration from classical importance sampling methods as well as recent developments for reachable-state identification and policy evaluation in reward-free exploration.
We study the problem of agnostic PAC reinforcement learning (RL): given a policy class $\Pi$, how many rounds of interaction with an unknown MDP (with a potentially large state and action space) are required to learn an $\epsilon$-suboptimal policy with respect to (\Pi)? Towards that end, we introduce a new complexity measure, called the spanning capacity, that depends solely on the set (\Pi) and is independent of the MDP dynamics. With a generative model, we show that the spanning capacity characterizes PAC learnability for every policy class $\Pi$. However, for online RL, the situation is more subtle. We show there exists a policy class $\Pi$ with a bounded spanning capacity that requires a superpolynomial number of samples to learn. This reveals a surprising separation for agnostic learnability between generative access and online access models (as well as between deterministic/stochastic MDPs under online access). On the positive side, we identify an additional sunflower structure which in conjunction with bounded spanning capacity enables statistically efficient online RL via a new algorithm called POPLER, which takes inspiration from classical importance sampling methods as well as recent developments for reachable-state identification and policy evaluation in reward-free exploration.
Author Information
Gene Li (TTIC)
Zeyu Jia (Massachusetts Institute of Technology)
Alexander Rakhlin (MIT)
Ayush Sekhari (Cornell University)
Nati Srebro (Toyota Technological Institute at Chicago)
More from the Same Authors
-
2021 : Remember What You Want to Forget: Algorithms for Machine Unlearning »
Ayush Sekhari · Ayush Sekhari · Jayadev Acharya · Gautam Kamath · Ananda Theertha Suresh -
2023 : On the Still Unreasonable Effectiveness of Federated Averaging for Heterogeneous Distributed Learning »
Kumar Kshitij Patel · Margalit Glasgow · Lingxiao Wang · Nirmit Joshi · Nati Srebro -
2023 : Contextual Bandits and Imitation Learning with Preference-Based Active Queries »
Ayush Sekhari · Karthik Sridharan · Wen Sun · Runzhe Wu -
2023 : Selective Sampling and Imitation Learning via Online Regression »
Ayush Sekhari · Karthik Sridharan · Wen Sun · Runzhe Wu -
2023 : Contextual Bandits and Imitation Learning with Preference-Based Active Queries »
Ayush Sekhari · Karthik Sridharan · Wen Sun · Runzhe Wu -
2023 Poster: Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL »
Zakaria Mhammedi · Dylan Foster · Alexander Rakhlin -
2023 Poster: Federated Online and Bandit Convex Optimization »
Kumar Kshitij Patel · Lingxiao Wang · Aadirupa Saha · Nati Srebro -
2023 Poster: Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings »
Masatoshi Uehara · Ayush Sekhari · Jason Lee · Nathan Kallus · Wen Sun -
2023 Oral: Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL »
Zakaria Mhammedi · Dylan Foster · Alexander Rakhlin -
2023 Poster: Continual Learning in Linear Classification on Separable Data »
Itay Evron · Edward Moroshko · Gon Buzaglo · Maroun Khriesh · Badea Marjieh · Nati Srebro · Daniel Soudry -
2022 Poster: Implicit Bias of the Step Size in Linear Diagonal Neural Networks »
Mor Shpigel Nacson · Kavya Ravichandran · Nati Srebro · Daniel Soudry -
2022 Spotlight: Implicit Bias of the Step Size in Linear Diagonal Neural Networks »
Mor Shpigel Nacson · Kavya Ravichandran · Nati Srebro · Daniel Soudry -
2022 : Q&A II »
Dylan Foster · Alexander Rakhlin -
2022 : Q&A »
Dylan Foster · Alexander Rakhlin -
2022 Tutorial: Bridging Learning and Decision Making »
Dylan Foster · Alexander Rakhlin -
2022 : Bridging Learning and Decision Making: Part I »
Alexander Rakhlin -
2021 Poster: Fast margin maximization via dual acceleration »
Ziwei Ji · Nati Srebro · Matus Telgarsky -
2021 Spotlight: Fast margin maximization via dual acceleration »
Ziwei Ji · Nati Srebro · Matus Telgarsky -
2021 Poster: Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels »
Eran Malach · Pritish Kamath · Emmanuel Abbe · Nati Srebro -
2021 Spotlight: Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels »
Eran Malach · Pritish Kamath · Emmanuel Abbe · Nati Srebro -
2021 Poster: Top-k eXtreme Contextual Bandits with Arm Hierarchy »
Rajat Sen · Alexander Rakhlin · Lexing Ying · Rahul Kidambi · Dean Foster · Daniel Hill · Inderjit Dhillon -
2021 Poster: Dropout: Explicit Forms and Capacity Control »
Raman Arora · Peter Bartlett · Poorya Mianjy · Nati Srebro -
2021 Spotlight: Dropout: Explicit Forms and Capacity Control »
Raman Arora · Peter Bartlett · Poorya Mianjy · Nati Srebro -
2021 Spotlight: Top-k eXtreme Contextual Bandits with Arm Hierarchy »
Rajat Sen · Alexander Rakhlin · Lexing Ying · Rahul Kidambi · Dean Foster · Daniel Hill · Inderjit Dhillon -
2021 Poster: On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent »
Shahar Azulay · Edward Moroshko · Mor Shpigel Nacson · Blake Woodworth · Nati Srebro · Amir Globerson · Daniel Soudry -
2021 Oral: On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent »
Shahar Azulay · Edward Moroshko · Mor Shpigel Nacson · Blake Woodworth · Nati Srebro · Amir Globerson · Daniel Soudry -
2020 Poster: Efficiently Learning Adversarially Robust Halfspaces with Noise »
Omar Montasser · Surbhi Goel · Ilias Diakonikolas · Nati Srebro -
2020 Poster: Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles »
Dylan Foster · Alexander Rakhlin -
2020 Poster: Is Local SGD Better than Minibatch SGD? »
Blake Woodworth · Kumar Kshitij Patel · Sebastian Stich · Zhen Dai · Brian Bullins · Brendan McMahan · Ohad Shamir · Nati Srebro -
2020 Poster: Fair Learning with Private Demographic Data »
Hussein Mozannar · Mesrob Ohannessian · Nati Srebro -
2019 : Nati Srebro: Optimization’s Untold Gift to Learning: Implicit Regularization »
Nati Srebro -
2019 : Panel Discussion (Nati Srebro, Dan Roy, Chelsea Finn, Mikhail Belkin, Aleksander Mądry, Jason Lee) »
Nati Srebro · Daniel Roy · Chelsea Finn · Mikhail Belkin · Aleksander Madry · Jason Lee -
2019 Workshop: Understanding and Improving Generalization in Deep Learning »
Dilip Krishnan · Hossein Mobahi · Behnam Neyshabur · Behnam Neyshabur · Peter Bartlett · Dawn Song · Nati Srebro -
2019 Poster: Semi-Cyclic Stochastic Gradient Descent »
Hubert Eichner · Tomer Koren · Brendan McMahan · Nati Srebro · Kunal Talwar -
2019 Oral: Semi-Cyclic Stochastic Gradient Descent »
Hubert Eichner · Tomer Koren · Brendan McMahan · Nati Srebro · Kunal Talwar -
2019 Poster: Near optimal finite time identification of arbitrary linear dynamical systems »
Tuhin Sarkar · Alexander Rakhlin -
2019 Oral: Near optimal finite time identification of arbitrary linear dynamical systems »
Tuhin Sarkar · Alexander Rakhlin -
2019 Poster: Training Well-Generalizing Classifiers for Fairness Metrics and Other Data-Dependent Constraints »
Andrew Cotter · Maya Gupta · Heinrich Jiang · Nati Srebro · Karthik Sridharan · Serena Wang · Blake Woodworth · Seungil You -
2019 Poster: Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models »
Mor Shpigel Nacson · Suriya Gunasekar · Jason Lee · Nati Srebro · Daniel Soudry -
2019 Oral: Training Well-Generalizing Classifiers for Fairness Metrics and Other Data-Dependent Constraints »
Andrew Cotter · Maya Gupta · Heinrich Jiang · Nati Srebro · Karthik Sridharan · Serena Wang · Blake Woodworth · Seungil You -
2019 Oral: Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models »
Mor Shpigel Nacson · Suriya Gunasekar · Jason Lee · Nati Srebro · Daniel Soudry -
2018 Poster: Characterizing Implicit Bias in Terms of Optimization Geometry »
Suriya Gunasekar · Jason Lee · Daniel Soudry · Nati Srebro -
2018 Oral: Characterizing Implicit Bias in Terms of Optimization Geometry »
Suriya Gunasekar · Jason Lee · Daniel Soudry · Nati Srebro -
2017 Poster: Efficient Distributed Learning with Sparsity »
Jialei Wang · Mladen Kolar · Nati Srebro · Tong Zhang -
2017 Talk: Efficient Distributed Learning with Sparsity »
Jialei Wang · Mladen Kolar · Nati Srebro · Tong Zhang -
2017 Poster: Communication-efficient Algorithms for Distributed Stochastic Principal Component Analysis »
Dan Garber · Ohad Shamir · Nati Srebro -
2017 Talk: Communication-efficient Algorithms for Distributed Stochastic Principal Component Analysis »
Dan Garber · Ohad Shamir · Nati Srebro