Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Distributional Reinforcement Learning with Quantile Regression

About

In reinforcement learning an agent interacts with the environment by taking actions and observing the next state and reward. When sampled probabilistically, these state transitions, rewards, and actions can all induce randomness in the observed long-term return. Traditionally, reinforcement learning algorithms average over this randomness to estimate the value function. In this paper, we build on recent work advocating a distributional approach to reinforcement learning in which the distribution over returns is modeled explicitly instead of only estimating the mean. That is, we examine methods of learning the value distribution instead of the value function. We give results that close a number of gaps between the theoretical and algorithmic results given by Bellemare, Dabney, and Munos (2017). First, we extend existing results to the approximate distribution setting. Second, we present a novel distributional reinforcement learning algorithm consistent with our theoretical formulation. Finally, we evaluate this new algorithm on the Atari 2600 games, observing that it significantly outperforms many of the recent improvements on DQN, including the related distributional algorithm C51.

Will Dabney, Mark Rowland, Marc G. Bellemare, R\'emi Munos• 2017

Related benchmarks

TaskDatasetResultRank
Continuous ControlMuJoCo Ant
Average Reward5.64e+3
26
Continuous ControlMuJoCo HalfCheetah
Average Reward1.12e+4
25
Continuous ControlMuJoCo Reacher
Average Reward-3.95
18
Reinforcement LearningAtari 2600 57 games (test)
Median Human-Normalized Score211
15
Continuous ControlHopper
Average Reward3.39e+3
15
Futures TradingETH Crypto Market
Total Return (%)-5.94
13
Futures TradingDOT Crypto Market
TR (%)-4.45
13
Futures TradingBTC Crypto Market
Total Return (%)2.84
13
Continuous ControlMuJoCo Humanoid
Average Reward5.00e+3
13
Futures TradingBNB Crypto Market
Total Return (%)-68.88
13
Showing 10 of 20 rows

Other info

Follow for update