Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Muesli: Combining Improvements in Policy Optimization

About

We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by extensive ablations, and by additional results on continuous control and 9x9 Go.

Matteo Hessel, Ivo Danihelka, Fabio Viola, Arthur Guez, Simon Schmitt, Laurent Sifre, Theophane Weber, David Silver, Hado van Hasselt• 2021

Related benchmarks

TaskDatasetResultRank
Reinforcement LearningALE Atari 57 games
HWRB5
16
Reinforcement LearningAtari-57 (full)
HWRB5
13
Atari Game PlayingAtari 57 games 200M environment frames
Median Human-Normalized Score1.04e+3
11
Reinforcement LearningAtari 57 (ALE) 200M frames sticky actions
Median Human-Normalized Score1.04e+3
9
Reinforcement LearningAtari 2600 (test)
Alien Score1.62e+4
5
Showing 5 of 5 rows

Other info

Follow for update