Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Deep Exploration via Bootstrapped DQN

About

Efficient exploration in complex environments remains a major challenge for reinforcement learning. We propose bootstrapped DQN, a simple algorithm that explores in a computationally and statistically efficient manner through use of randomized value functions. Unlike dithering strategies such as epsilon-greedy exploration, bootstrapped DQN carries out temporally-extended (or deep) exploration; this can lead to exponentially faster learning. We demonstrate these benefits in complex stochastic MDPs and in the large-scale Arcade Learning Environment. Bootstrapped DQN substantially improves learning times and performance across most Atari games.

Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy• 2016

Related benchmarks

TaskDatasetResultRank
Reinforcement LearningAtari 2600
Alien Score2.44e+3
15
Reinforcement LearningAcrobot v1
Mean Return-166.3
14
Reinforcement LearningSupply Chain Optimization Environment (test)
Max Reward18.2
10
Reinforcement LearningStochastic GridWorld (20% slip probability) (test)
Success Rate15
5
Reinforcement LearningHopper v5 (strong-drift)
Final Return18.14
5
Reinforcement LearningCartPole v1
Return2.68e+5
5
Reinforcement LearningCartPole Clean (test)
Clean Return2.68e+5
4
Reinforcement LearningCartPole 10% action noise (test)
Return (Noisy)185
4
Showing 8 of 8 rows

Other info

Code

Follow for update