Timezone: »
Oral
Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning
Ronan Fruit · Matteo Pirotta · Alessandro Lazaric · Ronald Ortner
We introduce SCAL, an algorithm designed to perform efficient exploration-exploration in any unknown weakly-communicating Markov Decision Process (MDP) for which an upper bound c on the span of the optimal bias function is known. For an MDP with $S$ states, $A$ actions and $\Gamma \leq S$ possible next states, we prove a regret bound of $O(c\sqrt{\Gamma SAT})$, which significantly improves over existing algorithms (e.g., UCRL and PSRL), whose regret scales linearly with the MDP diameter $D$. In fact, the optimal bias span is finite and often much smaller than $D$ (e.g., $D=+\infty$ in non-communicating MDPs). A similar result was originally derived by Bartlett and Tewari (2009) for REGAL.C, for which no tractable algorithm is available. In this paper, we relax the optimization problem at the core of REGAL.C, we carefully analyze its properties, and we provide the first computationally efficient algorithm to solve it. Finally, we report numerical simulations supporting our theoretical findings and showing how SCAL significantly outperforms UCRL in MDPs with large diameter and small span.
Author Information
Ronan Fruit (Inria Lille Nord-Europe)
Matteo Pirotta (SequeL - Inria Lille - Nord Europe)
Alessandro Lazaric (Facebook AI Research)
Ronald Ortner (Montanuniversitaet Leoben)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Poster: Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning »
Wed. Jul 11th 04:15 -- 07:00 PM Room Hall B #91
More from the Same Authors
-
2021 : Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection »
Matteo Papini · Andrea Tirinzoni · Aldo Pacchiano · Marcello Restelli · Alessandro Lazaric · Matteo Pirotta -
2021 : A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs »
Andrea Tirinzoni · Matteo Pirotta · Alessandro Lazaric -
2021 : Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret »
Jean Tarbouriech · Jean Tarbouriech · Simon Du · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2021 : A general sample complexity analysis of vanilla policy gradient »
Rui Yuan · Robert Gower · Alessandro Lazaric -
2021 : Direct then Diffuse: Incremental Unsupervised Skill Discovery for State Covering and Goal Reaching »
Pierre-Alexandre Kamienny · Jean Tarbouriech · Alessandro Lazaric · Ludovic Denoyer -
2021 : Exploration-Driven Representation Learning in Reinforcement Learning »
Akram Erraqabi · Mingde Zhao · Marlos C. Machado · Yoshua Bengio · Sainbayar Sukhbaatar · Ludovic Denoyer · Alessandro Lazaric -
2023 Poster: Layered State Discovery for Incremental Autonomous Exploration »
Liyu Chen · Andrea Tirinzoni · Alessandro Lazaric · Matteo Pirotta -
2022 Workshop: Responsible Decision Making in Dynamic Environments »
Virginie Do · Thorsten Joachims · Alessandro Lazaric · Joelle Pineau · Matteo Pirotta · Harsh Satija · Nicolas Usunier -
2022 Poster: Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times »
Daniele Calandriello · Luigi Carratino · Alessandro Lazaric · Michal Valko · Lorenzo Rosasco -
2022 Spotlight: Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times »
Daniele Calandriello · Luigi Carratino · Alessandro Lazaric · Michal Valko · Lorenzo Rosasco -
2021 : Invited Talk by Alessandro Lazaric »
Alessandro Lazaric -
2021 Poster: Leveraging Good Representations in Linear Contextual Bandits »
Matteo Papini · Andrea Tirinzoni · Marcello Restelli · Alessandro Lazaric · Matteo Pirotta -
2021 Spotlight: Leveraging Good Representations in Linear Contextual Bandits »
Matteo Papini · Andrea Tirinzoni · Marcello Restelli · Alessandro Lazaric · Matteo Pirotta -
2021 Poster: Reinforcement Learning with Prototypical Representations »
Denis Yarats · Rob Fergus · Alessandro Lazaric · Lerrel Pinto -
2021 Spotlight: Reinforcement Learning with Prototypical Representations »
Denis Yarats · Rob Fergus · Alessandro Lazaric · Lerrel Pinto -
2020 Poster: No-Regret Exploration in Goal-Oriented Reinforcement Learning »
Jean Tarbouriech · Evrard Garcelon · Michal Valko · Matteo Pirotta · Alessandro Lazaric -
2020 Poster: Efficient Optimistic Exploration in Linear-Quadratic Regulators via Lagrangian Relaxation »
Marc Abeille · Alessandro Lazaric -
2020 Poster: Learning Near Optimal Policies with Low Inherent Bellman Error »
Andrea Zanette · Alessandro Lazaric · Mykel Kochenderfer · Emma Brunskill -
2020 Poster: Meta-learning with Stochastic Linear Bandits »
Leonardo Cella · Alessandro Lazaric · Massimiliano Pontil -
2020 Poster: Near-linear time Gaussian process optimization with adaptive batching and resparsification »
Daniele Calandriello · Luigi Carratino · Alessandro Lazaric · Michal Valko · Lorenzo Rosasco -
2018 Poster: Importance Weighted Transfer of Samples in Reinforcement Learning »
Andrea Tirinzoni · Andrea Sessa · Matteo Pirotta · Marcello Restelli -
2018 Poster: Stochastic Variance-Reduced Policy Gradient »
Matteo Papini · Damiano Binaghi · Giuseppe Canonaco · Matteo Pirotta · Marcello Restelli -
2018 Poster: Improved large-scale graph learning through ridge spectral sparsification »
Daniele Calandriello · Alessandro Lazaric · Ioannis Koutis · Michal Valko -
2018 Oral: Importance Weighted Transfer of Samples in Reinforcement Learning »
Andrea Tirinzoni · Andrea Sessa · Matteo Pirotta · Marcello Restelli -
2018 Oral: Improved large-scale graph learning through ridge spectral sparsification »
Daniele Calandriello · Alessandro Lazaric · Ioannis Koutis · Michal Valko -
2018 Oral: Stochastic Variance-Reduced Policy Gradient »
Matteo Papini · Damiano Binaghi · Giuseppe Canonaco · Matteo Pirotta · Marcello Restelli -
2018 Poster: Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems »
Marc Abeille · Alessandro Lazaric -
2018 Oral: Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems »
Marc Abeille · Alessandro Lazaric -
2017 Poster: Boosted Fitted Q-Iteration »
Samuele Tosatto · Matteo Pirotta · Carlo D'Eramo · Marcello Restelli -
2017 Talk: Boosted Fitted Q-Iteration »
Samuele Tosatto · Matteo Pirotta · Carlo D'Eramo · Marcello Restelli