State-Action Inpainting Diffuser for Continuous Control with Delay

About

Signal delay poses a fundamental challenge in continuous control and reinforcement learning (RL) by introducing a temporal gap between interaction and perception. Current solutions have largely evolved along two distinct paradigms: model-free approaches which utilize state augmentation to preserve Markovian properties, and model-based methods which focus on inferring latent beliefs via dynamics modeling. In this paper, we bridge these perspectives by introducing State-Action Inpainting Diffuser (SAID), a framework that integrates the inductive bias of dynamics learning with the direct decision-making capability of policy optimization. By formulating the problem as a joint sequence inpainting task, SAID implicitly captures environmental dynamics while directly generating consistent plans, effectively operating at the intersection of model-based and model-free paradigms. Crucially, this generative formulation allows SAID to be seamlessly applied to both online and offline RL. Extensive experiments on delayed continuous control benchmarks demonstrate that SAID achieves state-of-the-art and robust performance. Our study suggests a new methodology to advance the field of RL with delay.

Dongqi Han, Wei Wang, Enze Zhang, Dongsheng Li• 2026

Related benchmarks

Task	Dataset	Result
Offline Reinforcement Learning	halfcheetah medium v2	Average Score58.5	27
Offline Reinforcement Learning	halfcheetah medium-expert v2	Normalized Score106.2	18
Offline Reinforcement Learning	walker2d medium v2	Normalized Score84.8	18
Reinforcement Learning	MuJoCo HalfCheetah v5	Mean Episodic Return1.48e+4	17
Reinforcement Learning	MuJoCo Ant v5	Mean Episodic Return5.95e+3	17
Reinforcement Learning	MuJoCo Hopper v5	Mean Episodic Return3.27e+3	17
Reinforcement Learning	MuJoCo Walker2d v5	Mean Episodic Return5.22e+3	17
Reinforcement Learning	Task Average HC, Ant, Hop, Walk v5	Mean Episodic Return7.28e+3	17
Offline Reinforcement Learning	halfcheetah medium-replay v2	Normalized Score50.2	14
Offline Reinforcement Learning	hopper medium v2	Normalized Score86.6	14

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord