Deep Exploration via Bootstrapped DQN

About

Efficient exploration in complex environments remains a major challenge for reinforcement learning. We propose bootstrapped DQN, a simple algorithm that explores in a computationally and statistically efficient manner through use of randomized value functions. Unlike dithering strategies such as epsilon-greedy exploration, bootstrapped DQN carries out temporally-extended (or deep) exploration; this can lead to exponentially faster learning. We demonstrate these benefits in complex stochastic MDPs and in the large-scale Arcade Learning Environment. Bootstrapped DQN substantially improves learning times and performance across most Atari games.

Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy• 2016

Related benchmarks

Task	Dataset	Result
Reinforcement Learning	Acrobot v1	Mean Return-166.3	42
General Competence	G2U Overall	Average Rank (Overall)10.6	30
Utility	G2U Utility	Mean Utility0.497	30
Curiosity	G2U Curiosity	Mean0.856	30
Survival	G2U Survival	Mean0.121	30
Reinforcement Learning	CartPole v1	Return2.68e+5	16
Reinforcement Learning	Atari 2600	Alien Score2.44e+3	15
Reinforcement Learning	Supply Chain Optimization Environment (test)	Max Reward18.2	10
Reinforcement Learning	Stochastic GridWorld (20% slip probability) (test)	Success Rate15	5
Reinforcement Learning	Hopper v5 (strong-drift)	Final Return18.14	5

Showing 10 of 12 rows

Other info

Code

Follow for update

@wizwand_team Discord