Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Massively Parallel Methods for Deep Reinforcement Learning

About

We present the first massively distributed architecture for deep reinforcement learning. This architecture uses four main components: parallel actors that generate new behaviour; parallel learners that are trained from stored experience; a distributed neural network to represent the value function or behaviour policy; and a distributed store of experience. We used our architecture to implement the Deep Q-Network algorithm (DQN). Our distributed algorithm was applied to 49 games from Atari 2600 games from the Arcade Learning Environment, using identical hyperparameters. Our performance surpassed non-distributed DQN in 41 of the 49 games and also reduced the wall-time required to achieve these results by an order of magnitude on most games.

Arun Nair, Praveen Srinivasan, Sam Blackwell, Cagdas Alcicek, Rory Fearon, Alessandro De Maria, Vedavyas Panneershelvam, Mustafa Suleyman, Charles Beattie, Stig Petersen, Shane Legg, Volodymyr Mnih, Koray Kavukcuoglu, David Silver• 2015

Related benchmarks

TaskDatasetResultRank
Reinforcement LearningAtari 2600 MONTEZUMA'S REVENGE
Score84
45
Reinforcement LearningAtari 2600 57 games (test)--
15
Atari Game PlayingAtari 2600 57 games human starts evaluation metric
Median Human-Normalized Score71.3
14
Reinforcement LearningAtari 2600 Arcade Learning Environment (evaluation)
Montezuma's Revenge Score4
11
Game PlayingAtari 2600 human starts 49 games (test)
Median Normalized Score47.5
3
Atari gamesAtari 2600 49 games, no-op starts (test)
Median Normalized Performance93.5
2
Showing 6 of 6 rows

Other info

Follow for update