Spotlight

Muesli: Combining Improvements in Policy Optimization

Matteo Hessel · Ivo Danihelka · Fabio Viola · Arthur Guez · Simon Schmitt · Laurent Sifre · Theophane Weber · David Silver · Hado van Hasselt

[ Abstract ] [ Livestream: Visit Deep Reinforcement Learning 1 ] [ Paper ]
Tue 20 Jul 5:30 a.m. — 5:35 a.m. PDT

We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by extensive ablations, and by additional results on continuous control and 9x9 Go.

Chat is not available.