Timezone: »
Spotlight
Muesli: Combining Improvements in Policy Optimization
Matteo Hessel · Ivo Danihelka · Fabio Viola · Arthur Guez · Simon Schmitt · Laurent Sifre · Theophane Weber · David Silver · Hado van Hasselt
We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by extensive ablations, and by additional results on continuous control and 9x9 Go.
Author Information
Matteo Hessel (DeepMind)
Ivo Danihelka (DeepMind)
Fabio Viola (DeepMind)
Arthur Guez (Google DeepMind)
Simon Schmitt (DeepMind)
Laurent Sifre (DeepMind)
Theophane Weber (DeepMind)
David Silver (Google DeepMind)
Hado van Hasselt (DeepMind)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Muesli: Combining Improvements in Policy Optimization »
Tue. Jul 20th 04:00 -- 06:00 PM Room Virtual
More from the Same Authors
-
2022 : Learning to induce causal structure »
Rosemary Nan Ke · Silvia Chiappa · Jane Wang · Jorg Bornschein · Anirudh Goyal · Melanie Rey · Matthew Botvinick · Theophane Weber · Michael Mozer · Danilo J. Rezende -
2023 Oral: Quantile Credit Assignment »
Thomas Mesnard · Wenqi Chen · Alaa Saade · Yunhao Tang · Mark Rowland · Theophane Weber · Clare Lyle · Audrunas Gruslys · Michal Valko · Will Dabney · Georg Ostrovski · Eric Moulines · Remi Munos -
2023 Poster: Investigating the Role of Model-Based Learning in Exploration and Transfer »
Jacob C Walker · Eszter Vértes · Yazhe Li · Gabriel Dulac-Arnold · Ankesh Anand · Theophane Weber · Jessica Hamrick -
2023 Poster: Quantile Credit Assignment »
Thomas Mesnard · Wenqi Chen · Alaa Saade · Yunhao Tang · Mark Rowland · Theophane Weber · Clare Lyle · Audrunas Gruslys · Michal Valko · Will Dabney · Georg Ostrovski · Eric Moulines · Remi Munos -
2022 Poster: Retrieval-Augmented Reinforcement Learning »
Anirudh Goyal · Abe Friesen Friesen · Andrea Banino · Theophane Weber · Nan Rosemary Ke · Adrià Puigdomenech Badia · Arthur Guez · Mehdi Mirza · Peter Humphreys · Ksenia Konyushkova · Michal Valko · Simon Osindero · Timothy Lillicrap · Nicolas Heess · Charles Blundell -
2022 Spotlight: Retrieval-Augmented Reinforcement Learning »
Anirudh Goyal · Abe Friesen Friesen · Andrea Banino · Theophane Weber · Nan Rosemary Ke · Adrià Puigdomenech Badia · Arthur Guez · Mehdi Mirza · Peter Humphreys · Ksenia Konyushkova · Michal Valko · Simon Osindero · Timothy Lillicrap · Nicolas Heess · Charles Blundell -
2022 Poster: Improving Language Models by Retrieving from Trillions of Tokens »
Sebastian Borgeaud · Arthur Mensch · Jordan Hoffmann · Trevor Cai · Eliza Rutherford · Katie Millican · George van den Driessche · Jean-Baptiste Lespiau · Bogdan Damoc · Aidan Clark · Diego de Las Casas · Aurelia Guy · Jacob Menick · Roman Ring · Tom Hennigan · Saffron Huang · Loren Maggiore · Chris Jones · Albin Cassirer · Andy Brock · Michela Paganini · Geoffrey Irving · Oriol Vinyals · Simon Osindero · Karen Simonyan · Jack Rae · Erich Elsen · Laurent Sifre -
2022 Poster: Unified Scaling Laws for Routed Language Models »
Aidan Clark · Diego de Las Casas · Aurelia Guy · Arthur Mensch · Michela Paganini · Jordan Hoffmann · Bogdan Damoc · Blake Hechtman · Trevor Cai · Sebastian Borgeaud · George van den Driessche · Eliza Rutherford · Tom Hennigan · Matthew Johnson · Albin Cassirer · Chris Jones · Elena Buchatskaya · David Budden · Laurent Sifre · Simon Osindero · Oriol Vinyals · Marc'Aurelio Ranzato · Jack Rae · Erich Elsen · Koray Kavukcuoglu · Karen Simonyan -
2022 Spotlight: Improving Language Models by Retrieving from Trillions of Tokens »
Sebastian Borgeaud · Arthur Mensch · Jordan Hoffmann · Trevor Cai · Eliza Rutherford · Katie Millican · George van den Driessche · Jean-Baptiste Lespiau · Bogdan Damoc · Aidan Clark · Diego de Las Casas · Aurelia Guy · Jacob Menick · Roman Ring · Tom Hennigan · Saffron Huang · Loren Maggiore · Chris Jones · Albin Cassirer · Andy Brock · Michela Paganini · Geoffrey Irving · Oriol Vinyals · Simon Osindero · Karen Simonyan · Jack Rae · Erich Elsen · Laurent Sifre -
2022 Oral: Unified Scaling Laws for Routed Language Models »
Aidan Clark · Diego de Las Casas · Aurelia Guy · Arthur Mensch · Michela Paganini · Jordan Hoffmann · Bogdan Damoc · Blake Hechtman · Trevor Cai · Sebastian Borgeaud · George van den Driessche · Eliza Rutherford · Tom Hennigan · Matthew Johnson · Albin Cassirer · Chris Jones · Elena Buchatskaya · David Budden · Laurent Sifre · Simon Osindero · Oriol Vinyals · Marc'Aurelio Ranzato · Jack Rae · Erich Elsen · Koray Kavukcuoglu · Karen Simonyan -
2021 Poster: Emphatic Algorithms for Deep Reinforcement Learning »
Ray Jiang · Tom Zahavy · Zhongwen Xu · Adam White · Matteo Hessel · Charles Blundell · Hado van Hasselt -
2021 Spotlight: Emphatic Algorithms for Deep Reinforcement Learning »
Ray Jiang · Tom Zahavy · Zhongwen Xu · Adam White · Matteo Hessel · Charles Blundell · Hado van Hasselt -
2021 Poster: Counterfactual Credit Assignment in Model-Free Reinforcement Learning »
Thomas Mesnard · Theophane Weber · Fabio Viola · Shantanu Thakoor · Alaa Saade · Anna Harutyunyan · Will Dabney · Thomas Stepleton · Nicolas Heess · Arthur Guez · Eric Moulines · Marcus Hutter · Lars Buesing · Remi Munos -
2021 Poster: Learning and Planning in Complex Action Spaces »
Thomas Hubert · Julian Schrittwieser · Ioannis Antonoglou · Mohammadamin Barekatain · Simon Schmitt · David Silver -
2021 Spotlight: Counterfactual Credit Assignment in Model-Free Reinforcement Learning »
Thomas Mesnard · Theophane Weber · Fabio Viola · Shantanu Thakoor · Alaa Saade · Anna Harutyunyan · Will Dabney · Thomas Stepleton · Nicolas Heess · Arthur Guez · Eric Moulines · Marcus Hutter · Lars Buesing · Remi Munos -
2021 Spotlight: Learning and Planning in Complex Action Spaces »
Thomas Hubert · Julian Schrittwieser · Ioannis Antonoglou · Mohammadamin Barekatain · Simon Schmitt · David Silver -
2020 : QA for invited talk 1 Silver »
David Silver -
2020 : Invited talk 1 Silver »
David Silver -
2020 Workshop: Inductive Biases, Invariances and Generalization in Reinforcement Learning »
Anirudh Goyal · Rosemary Nan Ke · Jane Wang · Stefan Bauer · Theophane Weber · Fabio Viola · Bernhard Schölkopf · Stefan Bauer -
2020 Poster: Off-Policy Actor-Critic with Shared Experience Replay »
Simon Schmitt · Matteo Hessel · Karen Simonyan -
2020 Poster: What Can Learned Intrinsic Rewards Capture? »
Zeyu Zheng · Junhyuk Oh · Matteo Hessel · Zhongwen Xu · Manuel Kroiss · Hado van Hasselt · David Silver · Satinder Singh -
2019 : panel discussion with Craig Boutilier (Google Research), Emma Brunskill (Stanford), Chelsea Finn (Google Brain, Stanford, UC Berkeley), Mohammad Ghavamzadeh (Facebook AI), John Langford (Microsoft Research) and David Silver (Deepmind) »
Peter Stone · Craig Boutilier · Emma Brunskill · Chelsea Finn · John Langford · David Silver · Mohammad Ghavamzadeh -
2019 : invited talk by David Silver (Deepmind): AlphaStar: Mastering the Game of StarCraft II »
David Silver -
2019 Poster: An Investigation of Model-Free Planning »
Arthur Guez · Mehdi Mirza · Karol Gregor · Rishabh Kabra · Sebastien Racaniere · Theophane Weber · David Raposo · Adam Santoro · Laurent Orseau · Tom Eccles · Greg Wayne · David Silver · Timothy Lillicrap -
2019 Oral: An Investigation of Model-Free Planning »
Arthur Guez · Mehdi Mirza · Karol Gregor · Rishabh Kabra · Sebastien Racaniere · Theophane Weber · David Raposo · Adam Santoro · Laurent Orseau · Tom Eccles · Greg Wayne · David Silver · Timothy Lillicrap -
2018 Poster: Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems »
Eugenio Bargiacchi · Timothy Verstraeten · Diederik Roijers · Ann Nowé · Hado van Hasselt -
2018 Oral: Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems »
Eugenio Bargiacchi · Timothy Verstraeten · Diederik Roijers · Ann Nowé · Hado van Hasselt -
2018 Poster: Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement »
Andre Barreto · Diana Borsa · John Quan · Tom Schaul · David Silver · Matteo Hessel · Daniel J. Mankowitz · Augustin Zidek · Remi Munos -
2018 Poster: Generative Temporal Models with Spatial Memory for Partially Observed Environments »
Marco Fraccaro · Danilo J. Rezende · Yori Zwols · Alexander Pritzel · S. M. Ali Eslami · Fabio Viola -
2018 Poster: Learning to search with MCTSnets »
Arthur Guez · Theophane Weber · Ioannis Antonoglou · Karen Simonyan · Oriol Vinyals · Daan Wierstra · Remi Munos · David Silver -
2018 Poster: Implicit Quantile Networks for Distributional Reinforcement Learning »
Will Dabney · Georg Ostrovski · David Silver · Remi Munos -
2018 Oral: Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement »
Andre Barreto · Diana Borsa · John Quan · Tom Schaul · David Silver · Matteo Hessel · Daniel J. Mankowitz · Augustin Zidek · Remi Munos -
2018 Oral: Generative Temporal Models with Spatial Memory for Partially Observed Environments »
Marco Fraccaro · Danilo J. Rezende · Yori Zwols · Alexander Pritzel · S. M. Ali Eslami · Fabio Viola -
2018 Oral: Implicit Quantile Networks for Distributional Reinforcement Learning »
Will Dabney · Georg Ostrovski · David Silver · Remi Munos -
2018 Oral: Learning to search with MCTSnets »
Arthur Guez · Theophane Weber · Ioannis Antonoglou · Karen Simonyan · Oriol Vinyals · Daan Wierstra · Remi Munos · David Silver -
2017 Poster: FeUdal Networks for Hierarchical Reinforcement Learning »
Alexander Vezhnevets · Simon Osindero · Tom Schaul · Nicolas Heess · Max Jaderberg · David Silver · Koray Kavukcuoglu -
2017 Poster: The Predictron: End-To-End Learning and Planning »
David Silver · Hado van Hasselt · Matteo Hessel · Tom Schaul · Arthur Guez · Tim Harley · Gabriel Dulac-Arnold · David Reichert · Neil Rabinowitz · Andre Barreto · Thomas Degris -
2017 Talk: FeUdal Networks for Hierarchical Reinforcement Learning »
Alexander Vezhnevets · Simon Osindero · Tom Schaul · Nicolas Heess · Max Jaderberg · David Silver · Koray Kavukcuoglu -
2017 Talk: The Predictron: End-To-End Learning and Planning »
David Silver · Hado van Hasselt · Matteo Hessel · Tom Schaul · Arthur Guez · Tim Harley · Gabriel Dulac-Arnold · David Reichert · Neil Rabinowitz · Andre Barreto · Thomas Degris -
2017 Poster: Decoupled Neural Interfaces using Synthetic Gradients »
Max Jaderberg · Wojciech Czarnecki · Simon Osindero · Oriol Vinyals · Alex Graves · David Silver · Koray Kavukcuoglu -
2017 Poster: Video Pixel Networks »
Nal Kalchbrenner · Karen Simonyan · Aäron van den Oord · Ivo Danihelka · Oriol Vinyals · Alex Graves · Koray Kavukcuoglu -
2017 Talk: Video Pixel Networks »
Nal Kalchbrenner · Karen Simonyan · Aäron van den Oord · Ivo Danihelka · Oriol Vinyals · Alex Graves · Koray Kavukcuoglu -
2017 Talk: Decoupled Neural Interfaces using Synthetic Gradients »
Max Jaderberg · Wojciech Czarnecki · Simon Osindero · Oriol Vinyals · Alex Graves · David Silver · Koray Kavukcuoglu