Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short Delays

About

Reinforcement learning (RL) is challenging in the common case of delays between events and their sensory perceptions. State-of-the-art (SOTA) state augmentation techniques either suffer from state space explosion or performance degeneration in stochastic environments. To address these challenges, we present a novel Auxiliary-Delayed Reinforcement Learning (AD-RL) method that leverages auxiliary tasks involving short delays to accelerate RL with long delays, without compromising performance in stochastic environments. Specifically, AD-RL learns a value function for short delays and uses bootstrapping and policy improvement techniques to adjust it for long delays. We theoretically show that this can greatly reduce the sample complexity. On deterministic and stochastic benchmarks, our method significantly outperforms the SOTAs in both sample efficiency and policy performance. Code is available at https://github.com/QingyuanWuNothing/AD-RL.

Qingyuan Wu, Simon Sinong Zhan, Yixuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, J\"urgen Schmidhuber, Chao Huang• 2024

Related benchmarks

Task	Dataset	Result
Continuous Control	MuJoCo Ant v4	--	46
Continuous Control	MuJoCo Walker2d v4	Normalized Performance112	39
Continuous Control	MuJoCo Hopper v4	Normalized Performance1.07	28
Continuous Control	MuJoCo HalfCheetah v4	Normalized Performance107	18
Continuous Control	MuJoCo Pusher v4	Normalized Performance1.36	18
Reinforcement Learning	MuJoCo Swimmer v4	Normalized Performance271	18
Continuous Control	MuJoCo Humanoid v4	Normalized Performance (Ret_nor)98	18
Continuous Control	MuJoCo HumanoidStandup v4	Normalized Performance1.22	18
Continuous Control	MuJoCo Reacher v4	Normalized Performance103	18
Continuous Control	MuJoCo v4 (test)	HumanoidStandup-v4 Score0.14	6

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord