Timezone: »
Talk
Minimax Regret Bounds for Reinforcement Learning
Mohammad Gheshlaghi Azar · Ian Osband · Remi Munos
We consider the problem of provably optimal exploration in reinforcement learning for finite horizon MDPs.
We show that an optimistic modification to value iteration achieves a regret bound of $\tilde {O}( \sqrt{HSAT} + H^2S^2A+H\sqrt{T})$ where $H$ is the time horizon, $S$ the number of states, $A$ the number of actions and $T$ the number of time-steps.
This result improves over the best previous known bound $\tilde {O}(HS \sqrt{AT})$ achieved by the UCRL2 algorithm.
The key significance of our new results is that when $T\geq H^3S^3A$ and $SA\geq H$, it leads to a regret of $\tilde{O}(\sqrt{HSAT})$ that matches the established lower bound of $\Omega(\sqrt{HSAT})$ up to a logarithmic factor.
Our analysis contain two key insights.
We use careful application of concentration inequalities to the optimal value function as a whole, rather than to the transitions probabilities (to improve scaling in $S$), and we use "exploration bonuses" built from Bernstein's inequality, together with using a recursive -Bellman-type- Law of Total Variance (to improve scaling in $H$).
Author Information
Mohammad Gheshlaghi Azar (Deepmind)
Ian Osband (Google DeepMind)
Remi Munos (DeepMind)
Related Events (a corresponding poster, oral, or spotlight)
-
2017 Poster: Minimax Regret Bounds for Reinforcement Learning »
Mon. Aug 7th 08:30 AM -- 12:00 PM Room Gallery #12
More from the Same Authors
-
2022 Poster: Generalised Policy Improvement with Geometric Policy Composition »
Shantanu Thakoor · Mark Rowland · Diana Borsa · Will Dabney · Remi Munos · Andre Barreto -
2022 Oral: Generalised Policy Improvement with Geometric Policy Composition »
Shantanu Thakoor · Mark Rowland · Diana Borsa · Will Dabney · Remi Munos · Andre Barreto -
2020 Poster: Fast computation of Nash Equilibria in Imperfect Information Games »
Remi Munos · Julien Perolat · Jean-Baptiste Lespiau · Mark Rowland · Bart De Vylder · Marc Lanctot · Finbarr Timbers · Daniel Hennes · Shayegan Omidshafiei · Audrunas Gruslys · Mohammad Gheshlaghi Azar · Edward Lockhart · Karl Tuyls -
2020 Poster: Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning »
Zhaohan Guo · Bernardo Avila Pires · Bilal Piot · Jean-Bastien Grill · Florent Altché · Remi Munos · Mohammad Gheshlaghi Azar -
2019 Poster: Statistics and Samples in Distributional Reinforcement Learning »
Mark Rowland · Robert Dadashi · Saurabh Kumar · Remi Munos · Marc Bellemare · Will Dabney -
2019 Oral: Statistics and Samples in Distributional Reinforcement Learning »
Mark Rowland · Robert Dadashi · Saurabh Kumar · Remi Munos · Marc Bellemare · Will Dabney -
2018 Poster: The Uncertainty Bellman Equation and Exploration »
Brendan O'Donoghue · Ian Osband · Remi Munos · Vlad Mnih -
2018 Poster: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures »
Lasse Espeholt · Hubert Soyer · Remi Munos · Karen Simonyan · Vlad Mnih · Tom Ward · Yotam Doron · Vlad Firoiu · Tim Harley · Iain Dunning · Shane Legg · Koray Kavukcuoglu -
2018 Poster: Autoregressive Quantile Networks for Generative Modeling »
Georg Ostrovski · Will Dabney · Remi Munos -
2018 Oral: The Uncertainty Bellman Equation and Exploration »
Brendan O'Donoghue · Ian Osband · Remi Munos · Vlad Mnih -
2018 Oral: Autoregressive Quantile Networks for Generative Modeling »
Georg Ostrovski · Will Dabney · Remi Munos -
2018 Oral: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures »
Lasse Espeholt · Hubert Soyer · Remi Munos · Karen Simonyan · Vlad Mnih · Tom Ward · Yotam Doron · Vlad Firoiu · Tim Harley · Iain Dunning · Shane Legg · Koray Kavukcuoglu -
2018 Poster: Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement »
Andre Barreto · Diana Borsa · John Quan · Tom Schaul · David Silver · Matteo Hessel · Daniel J. Mankowitz · Augustin Zidek · Remi Munos -
2018 Poster: Learning to search with MCTSnets »
Arthur Guez · Theophane Weber · Ioannis Antonoglou · Karen Simonyan · Oriol Vinyals · Daan Wierstra · Remi Munos · David Silver -
2018 Poster: Implicit Quantile Networks for Distributional Reinforcement Learning »
Will Dabney · Georg Ostrovski · David Silver · Remi Munos -
2018 Oral: Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement »
Andre Barreto · Diana Borsa · John Quan · Tom Schaul · David Silver · Matteo Hessel · Daniel J. Mankowitz · Augustin Zidek · Remi Munos -
2018 Oral: Implicit Quantile Networks for Distributional Reinforcement Learning »
Will Dabney · Georg Ostrovski · David Silver · Remi Munos -
2018 Oral: Learning to search with MCTSnets »
Arthur Guez · Theophane Weber · Ioannis Antonoglou · Karen Simonyan · Oriol Vinyals · Daan Wierstra · Remi Munos · David Silver -
2017 Poster: Count-Based Exploration with Neural Density Models »
Georg Ostrovski · Marc Bellemare · Aäron van den Oord · Remi Munos -
2017 Talk: Count-Based Exploration with Neural Density Models »
Georg Ostrovski · Marc Bellemare · Aäron van den Oord · Remi Munos -
2017 Poster: A Distributional Perspective on Reinforcement Learning »
Marc Bellemare · Will Dabney · Remi Munos -
2017 Poster: Automated Curriculum Learning for Neural Networks »
Alex Graves · Marc Bellemare · Jacob Menick · Remi Munos · Koray Kavukcuoglu -
2017 Talk: A Distributional Perspective on Reinforcement Learning »
Marc Bellemare · Will Dabney · Remi Munos -
2017 Talk: Automated Curriculum Learning for Neural Networks »
Alex Graves · Marc Bellemare · Jacob Menick · Remi Munos · Koray Kavukcuoglu