Deep Reinforcement Learning with Double Q-learning

About

The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether they harm performance, and whether they can generally be prevented. In this paper, we answer all these questions affirmatively. In particular, we first show that the recent DQN algorithm, which combines Q-learning with a deep neural network, suffers from substantial overestimations in some games in the Atari 2600 domain. We then show that the idea behind the Double Q-learning algorithm, which was introduced in a tabular setting, can be generalized to work with large-scale function approximation. We propose a specific adaptation to the DQN algorithm and show that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.

Hado van Hasselt, Arthur Guez, David Silver• 2015

Related benchmarks

Task	Dataset	Result
Sepsis treatment	MIMIC-IV (test)	WIS0.664	81
Multi-Objective Offline Policy Evaluation	MIMIC-IV (test)	FQE0.574	78
Reinforcement Learning	Atari 2600 MONTEZUMA'S REVENGE	Score42	45
Reinforcement Learning	Acrobot v1	Mean Return-7.72e+3	42
Reinforcement Learning	cartpole	Average Reward210	29
Atari Game Playing	Pitfall!	Score-30	25
Reinforcement Learning	Atari Breakout	Mean Return320	23
Reinforcement Learning	Atari 57	Atlantis6.48e+4	21
Reinforcement Learning	Atari 2600 57 games	Median Human-Normalized Score117	20
Reinforcement Learning	MountainCar	Avg Episode Reward-100	18

Showing 10 of 104 rows

...

Other info

Follow for update

@wizwand_team Discord