Poster

Muesli: Combining Improvements in Policy Optimization

Matteo Hessel ⋅ Ivo Danihelka ⋅ Fabio Viola ⋅ Arthur Guez ⋅ Simon Schmitt ⋅ Laurent Sifre ⋅ Theophane Weber ⋅ David Silver ⋅ Hado van Hasselt

Keywords: Deep RL Reinforcement Learning and Planning

2021 Poster

Paper PDF [ Slides] [ Paper ] [ Visit Poster at Spot C6 in Virtual World ]

Abstract

We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by extensive ablations, and by additional results on continuous control and 9x9 Go.

Video

Chat is not available.