Implicit Quantile Networks for Distributional Reinforcement Learning
About
In this work, we build on recent advances in distributional reinforcement learning to give a generally applicable, flexible, and state-of-the-art distributional variant of DQN. We achieve this by using quantile regression to approximate the full quantile function for the state-action return distribution. By reparameterizing a distribution over the sample space, this yields an implicitly defined return distribution and gives rise to a large class of risk-sensitive policies. We demonstrate improved performance on the 57 Atari 2600 games in the ALE, and use our algorithm's implicitly defined distributions to study the effects of risk-sensitive policies in Atari games.
Will Dabney, Georg Ostrovski, David Silver, R\'emi Munos• 2018
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Atari Game Playing | Atari 2600 57 games human starts evaluation metric | Median Human-Normalized Score162 | 14 | |
| Distributional Reinforcement Learning | American Put Option (test) | CVaR 1.00.4 | 13 | |
| Atari Game Playing | Atari 57 games 200M environment frames | Median Human-Normalized Score218 | 11 | |
| Reinforcement Learning | Windy Lunar Lander standard (test) | Expected Value (E)32.73 | 10 | |
| Reinforcement Learning | 55 Atari games | Mean Human-Normalized Score940 | 10 | |
| Climate Debias | Climate Debias (test) | ED0.116 | 8 | |
| Precipitation Downscaling | Precip. Downscale (test) | ED0.393 | 8 | |
| Elliptic PDE Inverse Problem | Elliptic PDE Inv. (test) | ED0.139 | 8 | |
| Fluid Flow Prediction | Navier-Stokes (test) | Energy Distance (ED)0.263 | 8 | |
| GP regression | GP Regression 2D (test) | Energy Distance0.427 | 8 |
Showing 10 of 18 rows