Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Implicit Quantile Networks for Distributional Reinforcement Learning

About

In this work, we build on recent advances in distributional reinforcement learning to give a generally applicable, flexible, and state-of-the-art distributional variant of DQN. We achieve this by using quantile regression to approximate the full quantile function for the state-action return distribution. By reparameterizing a distribution over the sample space, this yields an implicitly defined return distribution and gives rise to a large class of risk-sensitive policies. We demonstrate improved performance on the 57 Atari 2600 games in the ALE, and use our algorithm's implicitly defined distributions to study the effects of risk-sensitive policies in Atari games.

Will Dabney, Georg Ostrovski, David Silver, R\'emi Munos• 2018

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement Learningpuzzle-4x4-play OGBench 5 tasks v0
Average Success Rate27
28
Offline Reinforcement Learningscene-play OGBench 5 tasks v0
Average Success Rate41
26
Offline Reinforcement Learningcube-double-play OGBench 5 tasks v0
Average Success Rate42
19
Offline Reinforcement Learningpuzzle-3x3-play OGBench 5 tasks v0
Average Success Rate15
19
Continuous ControlWalker2D v5
Avg Return4.77e+3
17
Continuous ControlHopper v5
Average Return3.30e+3
15
Atari Game PlayingAtari 2600 57 games human starts evaluation metric
Median Human-Normalized Score162
14
Distributional Reinforcement LearningAmerican Put Option (test)
CVaR 1.00.4
13
Continuous ControlHumanoid v5
Average Return4.73e+3
13
Atari Game PlayingAtari 57 games 200M environment frames
Median Human-Normalized Score218
11
Showing 10 of 38 rows

Other info

Follow for update