Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Reinforcement Learning with Random Delays

About

Action and observation delays commonly occur in many Reinforcement Learning applications, such as remote control scenarios. We study the anatomy of randomly delayed environments, and show that partially resampling trajectory fragments in hindsight allows for off-policy multi-step value estimation. We apply this principle to derive Delay-Correcting Actor-Critic (DCAC), an algorithm based on Soft Actor-Critic with significantly better performance in environments with delays. This is shown theoretically and also demonstrated practically on a delay-augmented version of the MuJoCo continuous control benchmark.

Simon Ramstedt, Yann Bouteiller, Giovanni Beltrame, Christopher Pal, Jonathan Binas• 2020

Related benchmarks

TaskDatasetResultRank
Continuous ControlMuJoCo Walker2d v4
Normalized Performance85
24
Continuous ControlMuJoCo Ant v4
Normalized Return0.25
24
Continuous ControlMuJoCo HumanoidStandup v4
Normalized Performance1.16
18
Continuous ControlMuJoCo Reacher v4
Normalized Performance102
18
Continuous ControlMuJoCo Hopper v4
Normalized Performance1.16
18
Continuous ControlMuJoCo Humanoid v4
Normalized Performance (Ret_nor)59
18
Continuous ControlMuJoCo Pusher v4
Normalized Performance1.29
18
Reinforcement LearningMuJoCo Swimmer v4
Normalized Performance111
18
Continuous ControlMuJoCo HalfCheetah v4
Normalized Performance40
18
Continuous ControlMuJoCo v4 (test)
HumanoidStandup-v4 Score0.35
6
Showing 10 of 10 rows

Other info

Follow for update