Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Deep Exploration via Bootstrapped DQN

About

Efficient exploration in complex environments remains a major challenge for reinforcement learning. We propose bootstrapped DQN, a simple algorithm that explores in a computationally and statistically efficient manner through use of randomized value functions. Unlike dithering strategies such as epsilon-greedy exploration, bootstrapped DQN carries out temporally-extended (or deep) exploration; this can lead to exponentially faster learning. We demonstrate these benefits in complex stochastic MDPs and in the large-scale Arcade Learning Environment. Bootstrapped DQN substantially improves learning times and performance across most Atari games.

Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy• 2016

Related benchmarks

TaskDatasetResultRank
Reinforcement LearningAcrobot v1
Mean Return-166.3
42
General CompetenceG2U Overall
Average Rank (Overall)10.6
30
UtilityG2U Utility
Mean Utility0.497
30
CuriosityG2U Curiosity
Mean0.856
30
SurvivalG2U Survival
Mean0.121
30
Reinforcement LearningCartPole v1
Return2.68e+5
16
Reinforcement LearningAtari 2600
Alien Score2.44e+3
15
Reinforcement LearningSupply Chain Optimization Environment (test)
Max Reward18.2
10
Reinforcement LearningStochastic GridWorld (20% slip probability) (test)
Success Rate15
5
Reinforcement LearningHopper v5 (strong-drift)
Final Return18.14
5
Showing 10 of 12 rows

Other info

Code

Follow for update