Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

State-Action Inpainting Diffuser for Continuous Control with Delay

About

Signal delay poses a fundamental challenge in continuous control and reinforcement learning (RL) by introducing a temporal gap between interaction and perception. Current solutions have largely evolved along two distinct paradigms: model-free approaches which utilize state augmentation to preserve Markovian properties, and model-based methods which focus on inferring latent beliefs via dynamics modeling. In this paper, we bridge these perspectives by introducing State-Action Inpainting Diffuser (SAID), a framework that integrates the inductive bias of dynamics learning with the direct decision-making capability of policy optimization. By formulating the problem as a joint sequence inpainting task, SAID implicitly captures environmental dynamics while directly generating consistent plans, effectively operating at the intersection of model-based and model-free paradigms. Crucially, this generative formulation allows SAID to be seamlessly applied to both online and offline RL. Extensive experiments on delayed continuous control benchmarks demonstrate that SAID achieves state-of-the-art and robust performance. Our study suggests a new methodology to advance the field of RL with delay.

Dongqi Han, Wei Wang, Enze Zhang, Dongsheng Li• 2026

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement Learninghalfcheetah medium v2
Average Score58.5
27
Offline Reinforcement Learninghalfcheetah medium-expert v2
Normalized Score106.2
18
Offline Reinforcement Learningwalker2d medium v2
Normalized Score84.8
18
Reinforcement LearningMuJoCo HalfCheetah v5
Mean Episodic Return1.48e+4
17
Reinforcement LearningMuJoCo Ant v5
Mean Episodic Return5.95e+3
17
Reinforcement LearningMuJoCo Hopper v5
Mean Episodic Return3.27e+3
17
Reinforcement LearningMuJoCo Walker2d v5
Mean Episodic Return5.22e+3
17
Reinforcement LearningTask Average HC, Ant, Hop, Walk v5
Mean Episodic Return7.28e+3
17
Offline Reinforcement Learninghalfcheetah medium-replay v2
Normalized Score50.2
14
Offline Reinforcement Learninghopper medium v2
Normalized Score86.6
14
Showing 10 of 10 rows

Other info

Follow for update