Timezone: »
We study the role of the representation in finite-horizon Markov Decision Processes (MDPs) with linear structure. We provide a necessary condition for achieving constant regret in any MDP with linear reward representation (even with known dynamics). This result encompasses the well-known scenario of low-rank MDPs and, more generally, zero inherent Bellman error. We demonstrate that this condition is not only necessary but also sufficient for these classes, by deriving a constant regret bound for two optimistic algorithms. As far as we know, this is the first constant regret result for MDPs. Finally, we study the problem of representation selection showing that our proposed algorithm achieves constant regret when one of the given representations is "good". Furthermore, our algorithm can combine representations and achieve constant regret also when none of the representations would.
Author Information
Matteo Papini (Universitat Pompeu Fabra)
Andrea Tirinzoni (Inria)
Aldo Pacchiano (UC Berkeley)
Marcello Restelli (Politecnico di Milano)
Alessandro Lazaric (Facebook AI Research)
Matteo Pirotta (Facebook AI Research)
More from the Same Authors
-
2021 : Meta Learning the Step Size in Policy Gradient Methods »
Luca Sabbioni · Francesco Corda · Marcello Restelli -
2021 : Subgaussian Importance Sampling for Off-Policy Evaluation and Learning »
Alberto Maria Metelli · Alessio Russo · Marcello Restelli -
2021 : The Importance of Non-Markovianity in Maximum State Entropy Exploration »
Mirco Mutti · Riccardo De Santi · Marcello Restelli -
2021 : Efficient Inverse Reinforcement Learning of Transferable Rewards »
Giorgia Ramponi · Alberto Maria Metelli · Marcello Restelli -
2021 : Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity »
Dhruv Malik · Aldo Pacchiano · Vishwak Srinivasan · Yuanzhi Li -
2021 : A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs »
Andrea Tirinzoni · Matteo Pirotta · Alessandro Lazaric -
2021 : Bridging The Gap between Local and Joint Differential Privacy in RL »
Evrard Garcelon · Vianney Perchet · Ciara Pike-Burke · Matteo Pirotta -
2021 : Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret »
Jean Tarbouriech · Jean Tarbouriech · Simon Du · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2021 : A general sample complexity analysis of vanilla policy gradient »
Rui Yuan · Robert Gower · Alessandro Lazaric -
2021 : Estimating Optimal Policy Value in Linear Contextual Bandits beyond Gaussianity »
Jonathan Lee · Weihao Kong · Aldo Pacchiano · Vidya Muthukumar · Emma Brunskill -
2021 : Meta Learning MDPs with linear transition models »
Robert Müller · Aldo Pacchiano · Jack Parker-Holder -
2021 : Reward-Free Policy Space Compression for Reinforcement Learning »
Mirco Mutti · Stefano Del Col · Marcello Restelli -
2021 : Learning to Explore Multiple Environments without Rewards »
Mirco Mutti · Mattia Mancassola · Marcello Restelli -
2021 : The Importance of Non-Markovianity in Maximum State Entropy Exploration »
Mirco Mutti · Riccardo De Santi · Marcello Restelli -
2021 : Direct then Diffuse: Incremental Unsupervised Skill Discovery for State Covering and Goal Reaching »
Pierre-Alexandre Kamienny · Jean Tarbouriech · Alessandro Lazaric · Ludovic Denoyer -
2021 : Exploration-Driven Representation Learning in Reinforcement Learning »
Akram Erraqabi · Mingde Zhao · Marlos C. Machado · Yoshua Bengio · Sainbayar Sukhbaatar · Ludovic Denoyer · Alessandro Lazaric -
2021 : On the Theory of Reinforcement Learning with Once-per-Episode Feedback »
Niladri Chatterji · Aldo Pacchiano · Peter Bartlett · Michael Jordan -
2022 : Challenging Common Assumptions in Convex Reinforcement Learning »
Mirco Mutti · Riccardo De Santi · Piersilvio De Bartolomeis · Marcello Restelli -
2022 : Stochastic Rising Bandits for Online Model Selection »
Alberto Maria Metelli · Francesco Trovò · Matteo Pirola · Marcello Restelli -
2022 : Dynamical Linear Bandits for Long-Lasting Vanishing Rewards »
Marco Mussi · Alberto Maria Metelli · Marcello Restelli -
2022 : Invariance Discovery for Systematic Generalization in Reinforcement Learning »
Mirco Mutti · Riccardo De Santi · Emanuele Rossi · Juan Calderon · Michael Bronstein · Marcello Restelli -
2022 : Recursive History Representations for Unsupervised Reinforcement Learning in Multiple-Environments »
Mirco Mutti · Pietro Maldini · Riccardo De Santi · Marcello Restelli -
2022 : Directed Exploration via Uncertainty-Aware Critics »
Amarildo Likmeta · Matteo Sacco · Alberto Maria Metelli · Marcello Restelli -
2022 : Non-Markovian Policies for Unsupervised Reinforcement Learning in Multiple Environments »
Pietro Maldini · Mirco Mutti · Riccardo De Santi · Marcello Restelli -
2023 Poster: Truncating Trajectories in Monte Carlo Reinforcement Learning »
Riccardo Poiani · Alberto Maria Metelli · Marcello Restelli -
2023 Poster: Layered State Discovery for Incremental Autonomous Exploration »
Liyu Chen · Andrea Tirinzoni · Alessandro Lazaric · Matteo Pirotta -
2023 Poster: Towards Theoretical Understanding of Inverse Reinforcement Learning »
Alberto Maria Metelli · Filippo Lazzati · Marcello Restelli -
2023 Poster: Leveraging Offline Data in Online Reinforcement Learning »
Andrew Wagenmaker · Aldo Pacchiano -
2023 Poster: Dynamical Linear Bandits »
Marco Mussi · Alberto Maria Metelli · Marcello Restelli -
2023 Oral: Towards Theoretical Understanding of Inverse Reinforcement Learning »
Alberto Maria Metelli · Filippo Lazzati · Marcello Restelli -
2022 Workshop: Responsible Decision Making in Dynamic Environments »
Virginie Do · Thorsten Joachims · Alessandro Lazaric · Joelle Pineau · Matteo Pirotta · Harsh Satija · Nicolas Usunier -
2022 Poster: The Importance of Non-Markovianity in Maximum State Entropy Exploration »
Mirco Mutti · Riccardo De Santi · Marcello Restelli -
2022 Poster: Balancing Sample Efficiency and Suboptimality in Inverse Reinforcement Learning »
Angelo Damiani · Giorgio Manganini · Alberto Maria Metelli · Marcello Restelli -
2022 Spotlight: Balancing Sample Efficiency and Suboptimality in Inverse Reinforcement Learning »
Angelo Damiani · Giorgio Manganini · Alberto Maria Metelli · Marcello Restelli -
2022 Oral: The Importance of Non-Markovianity in Maximum State Entropy Exploration »
Mirco Mutti · Riccardo De Santi · Marcello Restelli -
2022 Poster: Stochastic Rising Bandits »
Alberto Maria Metelli · Francesco Trovò · Matteo Pirola · Marcello Restelli -
2022 Poster: Delayed Reinforcement Learning by Imitation »
Pierre Liotet · Davide Maran · Lorenzo Bisi · Marcello Restelli -
2022 Spotlight: Delayed Reinforcement Learning by Imitation »
Pierre Liotet · Davide Maran · Lorenzo Bisi · Marcello Restelli -
2022 Spotlight: Stochastic Rising Bandits »
Alberto Maria Metelli · Francesco Trovò · Matteo Pirola · Marcello Restelli -
2022 Poster: Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times »
Daniele Calandriello · Luigi Carratino · Alessandro Lazaric · Michal Valko · Lorenzo Rosasco -
2022 Poster: Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback »
Tianyi Lin · Aldo Pacchiano · Yaodong Yu · Michael Jordan -
2022 Spotlight: Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times »
Daniele Calandriello · Luigi Carratino · Alessandro Lazaric · Michal Valko · Lorenzo Rosasco -
2022 Spotlight: Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback »
Tianyi Lin · Aldo Pacchiano · Yaodong Yu · Michael Jordan -
2021 : On the Theory of Reinforcement Learning with Once-per-Episode Feedback »
Niladri Chatterji · Aldo Pacchiano · Peter Bartlett · Michael Jordan -
2021 : Invited Talk by Alessandro Lazaric »
Alessandro Lazaric -
2021 Poster: Leveraging Good Representations in Linear Contextual Bandits »
Matteo Papini · Andrea Tirinzoni · Marcello Restelli · Alessandro Lazaric · Matteo Pirotta -
2021 Spotlight: Leveraging Good Representations in Linear Contextual Bandits »
Matteo Papini · Andrea Tirinzoni · Marcello Restelli · Alessandro Lazaric · Matteo Pirotta -
2021 Poster: Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity »
Dhruv Malik · Aldo Pacchiano · Vishwak Srinivasan · Yuanzhi Li -
2021 Poster: Dynamic Balancing for Model Selection in Bandits and RL »
Ashok Cutkosky · Christoph Dann · Abhimanyu Das · Claudio Gentile · Aldo Pacchiano · Manish Purohit -
2021 Spotlight: Dynamic Balancing for Model Selection in Bandits and RL »
Ashok Cutkosky · Christoph Dann · Abhimanyu Das · Claudio Gentile · Aldo Pacchiano · Manish Purohit -
2021 Spotlight: Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity »
Dhruv Malik · Aldo Pacchiano · Vishwak Srinivasan · Yuanzhi Li -
2021 Poster: Kernel-Based Reinforcement Learning: A Finite-Time Analysis »
Omar Darwiche Domingues · Pierre Menard · Matteo Pirotta · Emilie Kaufmann · Michal Valko -
2021 Spotlight: Kernel-Based Reinforcement Learning: A Finite-Time Analysis »
Omar Darwiche Domingues · Pierre Menard · Matteo Pirotta · Emilie Kaufmann · Michal Valko -
2021 Poster: Reinforcement Learning with Prototypical Representations »
Denis Yarats · Rob Fergus · Alessandro Lazaric · Lerrel Pinto -
2021 Poster: Provably Efficient Learning of Transferable Rewards »
Alberto Maria Metelli · Giorgia Ramponi · Alessandro Concetti · Marcello Restelli -
2021 Spotlight: Provably Efficient Learning of Transferable Rewards »
Alberto Maria Metelli · Giorgia Ramponi · Alessandro Concetti · Marcello Restelli -
2021 Spotlight: Reinforcement Learning with Prototypical Representations »
Denis Yarats · Rob Fergus · Alessandro Lazaric · Lerrel Pinto -
2020 Poster: On Thompson Sampling with Langevin Algorithms »
Eric Mazumdar · Aldo Pacchiano · Yian Ma · Michael Jordan · Peter Bartlett -
2020 Poster: Accelerated Message Passing for Entropy-Regularized MAP Inference »
Jonathan Lee · Aldo Pacchiano · Peter Bartlett · Michael Jordan -
2020 Poster: Stochastic Flows and Geometric Optimization on the Orthogonal Group »
Krzysztof Choromanski · David Cheikhi · Jared Quincy Davis · Valerii Likhosherstov · Achille Nazaret · Achraf Bahamou · Xingyou Song · Mrugank Akarte · Jack Parker-Holder · Jacob Bergquist · Yuan Gao · Aldo Pacchiano · Tamas Sarlos · Adrian Weller · Vikas Sindhwani -
2020 Poster: Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning »
Alberto Maria Metelli · Flavio Mazzolini · Lorenzo Bisi · Luca Sabbioni · Marcello Restelli -
2020 Poster: No-Regret Exploration in Goal-Oriented Reinforcement Learning »
Jean Tarbouriech · Evrard Garcelon · Michal Valko · Matteo Pirotta · Alessandro Lazaric -
2020 Poster: Efficient Optimistic Exploration in Linear-Quadratic Regulators via Lagrangian Relaxation »
Marc Abeille · Alessandro Lazaric -
2020 Poster: Learning Near Optimal Policies with Low Inherent Bellman Error »
Andrea Zanette · Alessandro Lazaric · Mykel Kochenderfer · Emma Brunskill -
2020 Poster: Learning to Score Behaviors for Guided Policy Optimization »
Aldo Pacchiano · Jack Parker-Holder · Yunhao Tang · Krzysztof Choromanski · Anna Choromanska · Michael Jordan -
2020 Poster: Meta-learning with Stochastic Linear Bandits »
Leonardo Cella · Alessandro Lazaric · Massimiliano Pontil -
2020 Poster: Sequential Transfer in Reinforcement Learning with a Generative Model »
Andrea Tirinzoni · Riccardo Poiani · Marcello Restelli -
2020 Poster: Ready Policy One: World Building Through Active Learning »
Philip Ball · Jack Parker-Holder · Aldo Pacchiano · Krzysztof Choromanski · Stephen Roberts -
2020 Poster: Near-linear time Gaussian process optimization with adaptive batching and resparsification »
Daniele Calandriello · Luigi Carratino · Alessandro Lazaric · Michal Valko · Lorenzo Rosasco -
2019 Poster: Reinforcement Learning in Configurable Continuous Environments »
Alberto Maria Metelli · Emanuele Ghelfi · Marcello Restelli -
2019 Oral: Reinforcement Learning in Configurable Continuous Environments »
Alberto Maria Metelli · Emanuele Ghelfi · Marcello Restelli -
2019 Poster: Transfer of Samples in Policy Search via Multiple Importance Sampling »
Andrea Tirinzoni · Mattia Salvini · Marcello Restelli -
2019 Oral: Transfer of Samples in Policy Search via Multiple Importance Sampling »
Andrea Tirinzoni · Mattia Salvini · Marcello Restelli -
2019 Poster: Optimistic Policy Optimization via Multiple Importance Sampling »
Matteo Papini · Alberto Maria Metelli · Lorenzo Lupo · Marcello Restelli -
2019 Poster: Online learning with kernel losses »
Niladri Chatterji · Aldo Pacchiano · Peter Bartlett -
2019 Oral: Optimistic Policy Optimization via Multiple Importance Sampling »
Matteo Papini · Alberto Maria Metelli · Lorenzo Lupo · Marcello Restelli -
2019 Oral: Online learning with kernel losses »
Niladri Chatterji · Aldo Pacchiano · Peter Bartlett -
2018 Poster: Importance Weighted Transfer of Samples in Reinforcement Learning »
Andrea Tirinzoni · Andrea Sessa · Matteo Pirotta · Marcello Restelli -
2018 Poster: Stochastic Variance-Reduced Policy Gradient »
Matteo Papini · Damiano Binaghi · Giuseppe Canonaco · Matteo Pirotta · Marcello Restelli -
2018 Poster: Configurable Markov Decision Processes »
Alberto Maria Metelli · Mirco Mutti · Marcello Restelli -
2018 Poster: Improved large-scale graph learning through ridge spectral sparsification »
Daniele Calandriello · Alessandro Lazaric · Ioannis Koutis · Michal Valko -
2018 Oral: Importance Weighted Transfer of Samples in Reinforcement Learning »
Andrea Tirinzoni · Andrea Sessa · Matteo Pirotta · Marcello Restelli -
2018 Oral: Configurable Markov Decision Processes »
Alberto Maria Metelli · Mirco Mutti · Marcello Restelli -
2018 Oral: Improved large-scale graph learning through ridge spectral sparsification »
Daniele Calandriello · Alessandro Lazaric · Ioannis Koutis · Michal Valko -
2018 Oral: Stochastic Variance-Reduced Policy Gradient »
Matteo Papini · Damiano Binaghi · Giuseppe Canonaco · Matteo Pirotta · Marcello Restelli -
2018 Poster: Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning »
Ronan Fruit · Matteo Pirotta · Alessandro Lazaric · Ronald Ortner -
2018 Poster: Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems »
Marc Abeille · Alessandro Lazaric -
2018 Oral: Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems »
Marc Abeille · Alessandro Lazaric -
2018 Oral: Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning »
Ronan Fruit · Matteo Pirotta · Alessandro Lazaric · Ronald Ortner -
2017 Poster: Boosted Fitted Q-Iteration »
Samuele Tosatto · Matteo Pirotta · Carlo D'Eramo · Marcello Restelli -
2017 Talk: Boosted Fitted Q-Iteration »
Samuele Tosatto · Matteo Pirotta · Carlo D'Eramo · Marcello Restelli