Transformers are Sample-Efficient World Models

About

Deep reinforcement learning agents are notoriously sample inefficient, which considerably limits their application to real-world problems. Recently, many model-based methods have been designed to address this issue, with learning in the imagination of a world model being one of the most prominent approaches. However, while virtually unlimited interaction with a simulated environment sounds appealing, the world model has to be accurate over extended periods of time. Motivated by the success of Transformers in sequence modeling tasks, we introduce IRIS, a data-efficient agent that learns in a world model composed of a discrete autoencoder and an autoregressive Transformer. With the equivalent of only two hours of gameplay in the Atari 100k benchmark, IRIS achieves a mean human normalized score of 1.046, and outperforms humans on 10 out of 26 games, setting a new state of the art for methods without lookahead search. To foster future research on Transformers and world models for sample-efficient reinforcement learning, we release our code and models at https://github.com/eloialonso/iris.

Vincent Micheli, Eloi Alonso, Fran\c{c}ois Fleuret• 2022

Related benchmarks

Task	Dataset	Result
Reinforcement Learning	Atari 100k	Alien Score420	41
Reinforcement Learning	Atari 100K (test)	Mean Score1.93	21
Navigation	PointMaze	Success Rate74	21
Reinforcement Learning	Atari 100k steps (overall)	Game Score: Boxing70.1	9
Environment Interaction	Craftax 1M environment interactions classic	Return25	9
Reinforcement Learning	Atari Assault 100k (test)	HNS2.51	6
Reinforcement Learning	Atari Breakout 100k (test)	HNS285	6
Table-top manipulation	Push T	Success Rate32	5
2D Navigation	Wall	Success Rate4	5
Deformable body manipulation	Rope	Chamfer Distance1.11	4

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord