Rainbow: Combining Improvements in Deep Reinforcement Learning
About
The deep reinforcement learning community has made several independent improvements to the DQN algorithm. However, it is unclear which of these extensions are complementary and can be fruitfully combined. This paper examines six extensions to the DQN algorithm and empirically studies their combination. Our experiments show that the combination provides state-of-the-art performance on the Atari 2600 benchmark, both in terms of data efficiency and final performance. We also provide results from a detailed ablation study that shows the contribution of each component to overall performance.
Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver• 2017
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Reinforcement Learning | Atari 2600 MONTEZUMA'S REVENGE | Score154 | 45 | |
| Atari Game Playing | Pitfall! | Score0.00e+0 | 25 | |
| Reinforcement Learning | Atari100k (test) | Alien Score318.7 | 23 | |
| Reinforcement Learning | Atari 2600 57 games | Median Human-Normalized Score223 | 20 | |
| Reinforcement Learning | ALE Atari 57 games | HWRB4 | 16 | |
| Reinforcement Learning | Atari 2600 57 games (test) | Median Human-Normalized Score231.1 | 15 | |
| Deadline Compliance Scheduling | 200 Randomized Tasksets u ≈ 0.87 1.0 | Mean Compliance Rate78 | 15 | |
| Sudoku Solving | Sudoku 2x2 | Final Reward1.3 | 14 | |
| Atari Game Playing | Atari 2600 57 games human starts evaluation metric | Median Human-Normalized Score153 | 14 | |
| Reinforcement Learning | Atari-57 (full) | HWRB4 | 13 |
Showing 10 of 42 rows