Implicit Quantile Networks for Distributional Reinforcement Learning

About

In this work, we build on recent advances in distributional reinforcement learning to give a generally applicable, flexible, and state-of-the-art distributional variant of DQN. We achieve this by using quantile regression to approximate the full quantile function for the state-action return distribution. By reparameterizing a distribution over the sample space, this yields an implicitly defined return distribution and gives rise to a large class of risk-sensitive policies. We demonstrate improved performance on the 57 Atari 2600 games in the ALE, and use our algorithm's implicitly defined distributions to study the effects of risk-sensitive policies in Atari games.

Will Dabney, Georg Ostrovski, David Silver, R\'emi Munos• 2018

Related benchmarks

Task	Dataset	Result
Offline Reinforcement Learning	scene-play OGBench 5 tasks v0	Average Success Rate41	33
Offline Reinforcement Learning	puzzle-4x4-play OGBench 5 tasks v0	Average Success Rate27	28
Offline Reinforcement Learning	cube-double-play OGBench 5 tasks v0	Average Success Rate42	19
Offline Reinforcement Learning	puzzle-3x3-play OGBench 5 tasks v0	Average Success Rate15	19
Continuous Control	Walker2D v5	Avg Return4.77e+3	17
Navigation	OGBench humanoidmaze-medium-navigate	Success Rate (Offline)23	15
Continuous Control	Hopper v5	Average Return3.30e+3	15
Atari Game Playing	Atari 2600 57 games human starts evaluation metric	Median Human-Normalized Score162	14
Distributional Reinforcement Learning	American Put Option (test)	CVaR 1.00.4	13
Continuous Control	Humanoid v5	Average Return4.73e+3	13

Showing 10 of 46 rows

Other info

Follow for update

@wizwand_team Discord