Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

About

In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and extended training time. We have developed a new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation. We achieve stable learning at high throughput by combining decoupled acting and learning with a novel off-policy correction method called V-trace. We demonstrate the effectiveness of IMPALA for multi-task reinforcement learning on DMLab-30 (a set of 30 tasks from the DeepMind Lab environment (Beattie et al., 2016)) and Atari-57 (all available Atari games in Arcade Learning Environment (Bellemare et al., 2013a)). Our results show that IMPALA is able to achieve better performance than previous agents with less data, and crucially exhibits positive transfer between tasks as a result of its multi-task approach.

Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, Koray Kavukcuoglu• 2018

Related benchmarks

TaskDatasetResultRank
Reinforcement LearningAtari 2600 MONTEZUMA'S REVENGE
Score2.64e+3
45
Atari Game PlayingPitfall!
Score-1.2
25
Reinforcement LearningALE Atari 57 games
HWRB15
16
Reinforcement LearningAtari-57 (test)
Median Human Norm Return191.8
15
Reinforcement LearningAtari 2600 57 games (test)
Median Human-Normalized Score191.8
15
Reinforcement LearningAtari-57 (full)
HWRB15
13
Atari Game PlayingAtari 57 games 200M environment frames
Median Human-Normalized Score192
11
Reinforcement LearningAtari 2600 (test)
Alien Score1.99e+3
10
Reinforcement LearningAtari 57 (ALE) 200M frames sticky actions
Median Human-Normalized Score192
9
Multi-task reinforcement learningDMLab-30 (test)
Mean Capped Human Score58.4
8
Showing 10 of 16 rows

Other info

Code

Follow for update