Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Exploration by Random Network Distillation

About

We introduce an exploration bonus for deep reinforcement learning methods that is easy to implement and adds minimal overhead to the computation performed. The bonus is the error of a neural network predicting features of the observations given by a fixed randomly initialized neural network. We also introduce a method to flexibly combine intrinsic and extrinsic rewards. We find that the random network distillation (RND) bonus combined with this increased flexibility enables significant progress on several hard exploration Atari games. In particular we establish state of the art performance on Montezuma's Revenge, a game famously difficult for deep reinforcement learning methods. To the best of our knowledge, this is the first method that achieves better than average human performance on this game without using demonstrations or having access to the underlying state of the game, and occasionally completes the first level.

Yuri Burda, Harrison Edwards, Amos Storkey, Oleg Klimov• 2018

Related benchmarks

TaskDatasetResultRank
Reinforcement LearningAtari 2600 MONTEZUMA'S REVENGE
Score1.13e+4
45
Atari Game PlayingPitfall!
Score-3
25
Reinforcement LearningAtari 2600 Montezuma's Revenge ALE (test)
Score8.15e+3
24
State ExplorationMaze2D Square-b
State Coverage Ratio60
22
Reinforcement LearningAtari 2600 Gravitar ALE (test)
Score5.60e+3
19
Reinforcement LearningAtari 2600 Private Eye ALE (test)
Score1.50e+4
19
Reinforcement LearningAtari 2600 Qbert
Score1.22e+4
15
JumpURLB Quadruped 1.0 (test)
Mean Score681
12
Unsupervised Reinforcement LearningURL Benchmark (Walker)
Flip Score237
12
RunURLB Quadruped 1.0 (test)
Mean Score455
12
Showing 10 of 108 rows
...

Other info

Follow for update