Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

About

Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess and Go, where a perfect simulator is available. However, in real-world problems the dynamics governing the environment are often complex and unknown. In this work we present the MuZero algorithm which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. MuZero learns a model that, when applied iteratively, predicts the quantities most directly relevant to planning: the reward, the action-selection policy, and the value function. When evaluated on 57 different Atari games - the canonical video game environment for testing AI techniques, in which model-based planning approaches have historically struggled - our new algorithm achieved a new state of the art. When evaluated on Go, chess and shogi, without any knowledge of the game rules, MuZero matched the superhuman performance of the AlphaZero algorithm that was supplied with the game rules.

Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver• 2019

Related benchmarks

Task	Dataset	Result
Reinforcement Learning	Atari100k (test)	Alien Score530	23
Reinforcement Learning	Atari 57	Atlantis1.67e+6	21
Reinforcement Learning	Atari 2600 57 games (test)	Median Human-Normalized Score741.7	15
Reinforcement Learning	Atari-57 (full)	HWRB19	13
Reinforcement Learning	Atari 57 full suite 2600	Games Above Human Count51	11
Deep Sea exploration	Deep Sea 40x40	Goal Discovery Success Rate0.00e+0	9
Ball In Cup Catch	DMControl 100k (test)	Score5.42e+5	7
Reacher Easy	DMControl 100k (test)	Score4.93e+5	7
Cartpole Swingup	DMControl 100k (test)	Performance Score218.5	7
Reinforcement Learning	Atari 100k raw (test)	Alien530	7

Showing 10 of 17 rows

Other info

Follow for update

@wizwand_team Discord