Timezone: »
The ability to transfer skills across tasks has the potential to scale up reinforcement learning (RL) agents to environments currently out of reach. Recently, a framework based on two ideas, successor features (SFs) and generalised policy improvement (GPI), has been introduced as a principled way of transferring skills. In this paper we extend the SF&GPI framework in two ways. One of the basic assumptions underlying the original formulation of SF&GPI is that rewards for all tasks of interest can be computed as linear combinations of a fixed set of features. We relax this constraint and show that the theoretical guarantees supporting the framework can be extended to any set of tasks that only differ in the reward function. Our second contribution is to show that one can use the reward functions themselves as features for future tasks, without any loss of expressiveness, thus removing the need to specify a set of features beforehand. This makes it possible to combine SF&GPI with deep learning in a more stable way. We empirically verify this claim on a complex 3D environment where observations are images from a first-person perspective. We show that the transfer promoted by SF&GPI leads to very good policies on unseen tasks almost instantaneously. We also describe how to learn policies specialised to the new tasks in a way that allows them to be added to the agent's set of skills, and thus be reused in the future.
Author Information
Andre Barreto (DeepMind)
Diana Borsa (DeepMind)
John Quan (DeepMind)
Tom Schaul (DeepMind)
David Silver (Google DeepMind)
Matteo Hessel (Deep Mind)
Daniel J. Mankowitz (Technion)
Augustin Zidek
Remi Munos (DeepMind)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Oral: Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement »
Wed. Jul 11th 03:20 -- 03:40 PM Room A3
More from the Same Authors
-
2021 : Beyond Fine-Tuning: Transferring Behavior in Reinforcement Learning »
Víctor Campos · Pablo Sprechmann · Steven Hansen · Andre Barreto · Steven Kapturowski · Alex Vitvitskyi · Adrià Puigdomenech Badia · Charles Blundell -
2021 : Discovering Diverse Nearly Optimal Policies with Successor Features »
Tom Zahavy · Brendan O'Donoghue · Andre Barreto · Sebastian Flennerhag · Vlad Mnih · Satinder Singh -
2023 Poster: Understanding Self-Predictive Learning for Reinforcement Learning »
Yunhao Tang · Zhaohan Guo · Pierre Richemond · Bernardo Avila Pires · Yash Chandak · Remi Munos · Mark Rowland · Mohammad Gheshlaghi Azar · Charline Le Lan · Clare Lyle · Andras Gyorgy · Shantanu Thakoor · Will Dabney · Bilal Piot · Daniele Calandriello · Michal Valko -
2023 Poster: Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments »
Daniel Jarrett · Corentin Tallec · Florent Altché · Thomas Mesnard · Remi Munos · Michal Valko -
2023 Poster: Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition »
Yash Chandak · Shantanu Thakoor · Zhaohan Guo · Yunhao Tang · Remi Munos · Will Dabney · Diana Borsa -
2023 Poster: Towards a better understanding of representation dynamics under TD-learning »
Yunhao Tang · Remi Munos -
2023 Oral: Adapting to game trees in zero-sum imperfect information games »
Côme Fiegel · Pierre Menard · Tadashi Kozuno · Remi Munos · Vianney Perchet · Michal Valko -
2023 Poster: Adapting to game trees in zero-sum imperfect information games »
Côme Fiegel · Pierre Menard · Tadashi Kozuno · Remi Munos · Vianney Perchet · Michal Valko -
2023 Poster: Fast Rates for Maximum Entropy Exploration »
Daniil Tiapkin · Denis Belomestny · Daniele Calandriello · Eric Moulines · Remi Munos · Alexey Naumov · Pierre Perrault · Yunhao Tang · Michal Valko · Pierre Menard -
2023 Oral: Quantile Credit Assignment »
Thomas Mesnard · Wenqi Chen · Alaa Saade · Yunhao Tang · Mark Rowland · Theophane Weber · Clare Lyle · Audrunas Gruslys · Michal Valko · Will Dabney · Georg Ostrovski · Eric Moulines · Remi Munos -
2023 Poster: The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation »
Mark Rowland · Yunhao Tang · Clare Lyle · Remi Munos · Marc Bellemare · Will Dabney -
2023 Poster: Quantile Credit Assignment »
Thomas Mesnard · Wenqi Chen · Alaa Saade · Yunhao Tang · Mark Rowland · Theophane Weber · Clare Lyle · Audrunas Gruslys · Michal Valko · Will Dabney · Georg Ostrovski · Eric Moulines · Remi Munos -
2023 Poster: DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm »
Yunhao Tang · Tadashi Kozuno · Mark Rowland · Anna Harutyunyan · Remi Munos · Bernardo Avila Pires · Michal Valko -
2023 Poster: VA-learning as a more efficient alternative to Q-learning »
Yunhao Tang · Remi Munos · Mark Rowland · Michal Valko -
2023 Poster: Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice »
Toshinori Kitamura · Tadashi Kozuno · Yunhao Tang · Nino Vieillard · Michal Valko · Wenhao Yang · Jincheng Mei · Pierre Menard · Mohammad Gheshlaghi Azar · Remi Munos · Olivier Pietquin · Matthieu Geist · Csaba Szepesvari · Wataru Kumagai · Yutaka Matsuo -
2022 Workshop: Decision Awareness in Reinforcement Learning »
Evgenii Nikishin · Pierluca D'Oro · Doina Precup · Andre Barreto · Amir-massoud Farahmand · Pierre-Luc Bacon -
2022 Poster: Generalised Policy Improvement with Geometric Policy Composition »
Shantanu Thakoor · Mark Rowland · Diana Borsa · Will Dabney · Remi Munos · Andre Barreto -
2022 Oral: Generalised Policy Improvement with Geometric Policy Composition »
Shantanu Thakoor · Mark Rowland · Diana Borsa · Will Dabney · Remi Munos · Andre Barreto -
2022 Poster: Model-Value Inconsistency as a Signal for Epistemic Uncertainty »
Angelos Filos · Eszter Vértes · Zita Marinho · Gregory Farquhar · Diana Borsa · Abe Friesen · Feryal Behbahani · Tom Schaul · Andre Barreto · Simon Osindero -
2022 Spotlight: Model-Value Inconsistency as a Signal for Epistemic Uncertainty »
Angelos Filos · Eszter Vértes · Zita Marinho · Gregory Farquhar · Diana Borsa · Abe Friesen · Feryal Behbahani · Tom Schaul · Andre Barreto · Simon Osindero -
2021 Poster: Emphatic Algorithms for Deep Reinforcement Learning »
Ray Jiang · Tom Zahavy · Zhongwen Xu · Adam White · Matteo Hessel · Charles Blundell · Hado van Hasselt -
2021 Spotlight: Emphatic Algorithms for Deep Reinforcement Learning »
Ray Jiang · Tom Zahavy · Zhongwen Xu · Adam White · Matteo Hessel · Charles Blundell · Hado van Hasselt -
2021 Poster: Learning and Planning in Complex Action Spaces »
Thomas Hubert · Julian Schrittwieser · Ioannis Antonoglou · Mohammadamin Barekatain · Simon Schmitt · David Silver -
2021 Poster: Muesli: Combining Improvements in Policy Optimization »
Matteo Hessel · Ivo Danihelka · Fabio Viola · Arthur Guez · Simon Schmitt · Laurent Sifre · Theophane Weber · David Silver · Hado van Hasselt -
2021 Spotlight: Learning and Planning in Complex Action Spaces »
Thomas Hubert · Julian Schrittwieser · Ioannis Antonoglou · Mohammadamin Barekatain · Simon Schmitt · David Silver -
2021 Spotlight: Muesli: Combining Improvements in Policy Optimization »
Matteo Hessel · Ivo Danihelka · Fabio Viola · Arthur Guez · Simon Schmitt · Laurent Sifre · Theophane Weber · David Silver · Hado van Hasselt -
2020 : QA for invited talk 1 Silver »
David Silver -
2020 : Invited talk 1 Silver »
David Silver -
2020 Poster: Off-Policy Actor-Critic with Shared Experience Replay »
Simon Schmitt · Matteo Hessel · Karen Simonyan -
2020 Poster: What Can Learned Intrinsic Rewards Capture? »
Zeyu Zheng · Junhyuk Oh · Matteo Hessel · Zhongwen Xu · Manuel Kroiss · Hado van Hasselt · David Silver · Satinder Singh -
2019 : panel discussion with Craig Boutilier (Google Research), Emma Brunskill (Stanford), Chelsea Finn (Google Brain, Stanford, UC Berkeley), Mohammad Ghavamzadeh (Facebook AI), John Langford (Microsoft Research) and David Silver (Deepmind) »
Peter Stone · Craig Boutilier · Emma Brunskill · Chelsea Finn · John Langford · David Silver · Mohammad Ghavamzadeh -
2019 : invited talk by David Silver (Deepmind): AlphaStar: Mastering the Game of StarCraft II »
David Silver -
2019 Poster: Statistics and Samples in Distributional Reinforcement Learning »
Mark Rowland · Robert Dadashi · Saurabh Kumar · Remi Munos · Marc Bellemare · Will Dabney -
2019 Oral: Statistics and Samples in Distributional Reinforcement Learning »
Mark Rowland · Robert Dadashi · Saurabh Kumar · Remi Munos · Marc Bellemare · Will Dabney -
2019 Poster: Composing Entropic Policies using Divergence Correction »
Jonathan Hunt · Andre Barreto · Timothy Lillicrap · Nicolas Heess -
2019 Poster: An Investigation of Model-Free Planning »
Arthur Guez · Mehdi Mirza · Karol Gregor · Rishabh Kabra · Sebastien Racaniere · Theophane Weber · David Raposo · Adam Santoro · Laurent Orseau · Tom Eccles · Greg Wayne · David Silver · Timothy Lillicrap -
2019 Oral: An Investigation of Model-Free Planning »
Arthur Guez · Mehdi Mirza · Karol Gregor · Rishabh Kabra · Sebastien Racaniere · Theophane Weber · David Raposo · Adam Santoro · Laurent Orseau · Tom Eccles · Greg Wayne · David Silver · Timothy Lillicrap -
2019 Oral: Composing Entropic Policies using Divergence Correction »
Jonathan Hunt · Andre Barreto · Timothy Lillicrap · Nicolas Heess -
2018 Poster: The Uncertainty Bellman Equation and Exploration »
Brendan O'Donoghue · Ian Osband · Remi Munos · Vlad Mnih -
2018 Poster: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures »
Lasse Espeholt · Hubert Soyer · Remi Munos · Karen Simonyan · Vlad Mnih · Tom Ward · Yotam Doron · Vlad Firoiu · Tim Harley · Iain Dunning · Shane Legg · Koray Kavukcuoglu -
2018 Poster: Autoregressive Quantile Networks for Generative Modeling »
Georg Ostrovski · Will Dabney · Remi Munos -
2018 Oral: The Uncertainty Bellman Equation and Exploration »
Brendan O'Donoghue · Ian Osband · Remi Munos · Vlad Mnih -
2018 Oral: Autoregressive Quantile Networks for Generative Modeling »
Georg Ostrovski · Will Dabney · Remi Munos -
2018 Oral: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures »
Lasse Espeholt · Hubert Soyer · Remi Munos · Karen Simonyan · Vlad Mnih · Tom Ward · Yotam Doron · Vlad Firoiu · Tim Harley · Iain Dunning · Shane Legg · Koray Kavukcuoglu -
2018 Poster: Learning to search with MCTSnets »
Arthur Guez · Theophane Weber · Ioannis Antonoglou · Karen Simonyan · Oriol Vinyals · Daan Wierstra · Remi Munos · David Silver -
2018 Poster: Implicit Quantile Networks for Distributional Reinforcement Learning »
Will Dabney · Georg Ostrovski · David Silver · Remi Munos -
2018 Oral: Implicit Quantile Networks for Distributional Reinforcement Learning »
Will Dabney · Georg Ostrovski · David Silver · Remi Munos -
2018 Oral: Learning to search with MCTSnets »
Arthur Guez · Theophane Weber · Ioannis Antonoglou · Karen Simonyan · Oriol Vinyals · Daan Wierstra · Remi Munos · David Silver -
2017 Workshop: Lifelong Learning: A Reinforcement Learning Approach »
Sarath Chandar · Balaraman Ravindran · Daniel J. Mankowitz · Shie Mannor · Tom Zahavy -
2017 Poster: FeUdal Networks for Hierarchical Reinforcement Learning »
Alexander Vezhnevets · Simon Osindero · Tom Schaul · Nicolas Heess · Max Jaderberg · David Silver · Koray Kavukcuoglu -
2017 Poster: The Predictron: End-To-End Learning and Planning »
David Silver · Hado van Hasselt · Matteo Hessel · Tom Schaul · Arthur Guez · Tim Harley · Gabriel Dulac-Arnold · David Reichert · Neil Rabinowitz · Andre Barreto · Thomas Degris -
2017 Poster: Count-Based Exploration with Neural Density Models »
Georg Ostrovski · Marc Bellemare · Aäron van den Oord · Remi Munos -
2017 Talk: FeUdal Networks for Hierarchical Reinforcement Learning »
Alexander Vezhnevets · Simon Osindero · Tom Schaul · Nicolas Heess · Max Jaderberg · David Silver · Koray Kavukcuoglu -
2017 Talk: The Predictron: End-To-End Learning and Planning »
David Silver · Hado van Hasselt · Matteo Hessel · Tom Schaul · Arthur Guez · Tim Harley · Gabriel Dulac-Arnold · David Reichert · Neil Rabinowitz · Andre Barreto · Thomas Degris -
2017 Talk: Count-Based Exploration with Neural Density Models »
Georg Ostrovski · Marc Bellemare · Aäron van den Oord · Remi Munos -
2017 Poster: A Distributional Perspective on Reinforcement Learning »
Marc Bellemare · Will Dabney · Remi Munos -
2017 Poster: Decoupled Neural Interfaces using Synthetic Gradients »
Max Jaderberg · Wojciech Czarnecki · Simon Osindero · Oriol Vinyals · Alex Graves · David Silver · Koray Kavukcuoglu -
2017 Poster: Automated Curriculum Learning for Neural Networks »
Alex Graves · Marc Bellemare · Jacob Menick · Remi Munos · Koray Kavukcuoglu -
2017 Poster: Minimax Regret Bounds for Reinforcement Learning »
Mohammad Gheshlaghi Azar · Ian Osband · Remi Munos -
2017 Talk: A Distributional Perspective on Reinforcement Learning »
Marc Bellemare · Will Dabney · Remi Munos -
2017 Talk: Automated Curriculum Learning for Neural Networks »
Alex Graves · Marc Bellemare · Jacob Menick · Remi Munos · Koray Kavukcuoglu -
2017 Talk: Minimax Regret Bounds for Reinforcement Learning »
Mohammad Gheshlaghi Azar · Ian Osband · Remi Munos -
2017 Talk: Decoupled Neural Interfaces using Synthetic Gradients »
Max Jaderberg · Wojciech Czarnecki · Simon Osindero · Oriol Vinyals · Alex Graves · David Silver · Koray Kavukcuoglu