Implicit Quantile Networks for Distributional Reinforcement Learning
About
In this work, we build on recent advances in distributional reinforcement learning to give a generally applicable, flexible, and state-of-the-art distributional variant of DQN. We achieve this by using quantile regression to approximate the full quantile function for the state-action return distribution. By reparameterizing a distribution over the sample space, this yields an implicitly defined return distribution and gives rise to a large class of risk-sensitive policies. We demonstrate improved performance on the 57 Atari 2600 games in the ALE, and use our algorithm's implicitly defined distributions to study the effects of risk-sensitive policies in Atari games.
Will Dabney, Georg Ostrovski, David Silver, R\'emi Munos• 2018
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Offline Reinforcement Learning | puzzle-4x4-play OGBench 5 tasks v0 | Average Success Rate27 | 28 | |
| Offline Reinforcement Learning | scene-play OGBench 5 tasks v0 | Average Success Rate41 | 26 | |
| Offline Reinforcement Learning | cube-double-play OGBench 5 tasks v0 | Average Success Rate42 | 19 | |
| Offline Reinforcement Learning | puzzle-3x3-play OGBench 5 tasks v0 | Average Success Rate15 | 19 | |
| Continuous Control | Walker2D v5 | Avg Return4.77e+3 | 17 | |
| Continuous Control | Hopper v5 | Average Return3.30e+3 | 15 | |
| Atari Game Playing | Atari 2600 57 games human starts evaluation metric | Median Human-Normalized Score162 | 14 | |
| Distributional Reinforcement Learning | American Put Option (test) | CVaR 1.00.4 | 13 | |
| Continuous Control | Humanoid v5 | Average Return4.73e+3 | 13 | |
| Atari Game Playing | Atari 57 games 200M environment frames | Median Human-Normalized Score218 | 11 |
Showing 10 of 38 rows