DEFLECT: Temporal Counterfactual Preference Learning for Delay-Robust Asynchronous VLAs

About

Vision-Language-Action (VLA) policies increasingly rely on asynchronous inference to hide large-model latency behind ongoing robot motion. While this avoids the stop-and-go behavior of synchronous action-chunk execution, it creates a prediction-execution mismatch: the next chunk is computed from a stale observation at inference start but executed only after the robot and scene have evolved. As a result, actions that fit the prediction-time state can become misaligned with the execution-time state. Existing runtime repair, behavior-cloning, and preference-alignment approaches do not directly teach the policy to resolve this stale-input mismatch. We propose DEFLECT, an offline post-training framework for delay-robust asynchronous VLAs. DEFLECT converts latency-induced mismatch into counterfactual preference supervision: a frozen reference VLA generates a preferred chunk from the future execution-time observation and a rejected chunk from the stale prediction-time observation. The trainable policy scores both chunks under the same deployment-time input, learning to favor execution-time-aligned actions while a supervised fine-tuning anchor preserves the expert action manifold. DEFLECT requires no human preference labels, reward models, online robot rollouts, architectural changes, or additional inference-time computation. Across Kinetix, LIBERO, and three real-robot tasks, DEFLECT improves delay robustness over strong asynchronous VLA baselines, raising high-latency success by up to 6.4 percentage points and achieving a 4.6 percentage-point gain at the longest delay on a real-scale VLA.

Yixiang Zhu, Yonghao Chen, Zijie Yang, Yusong Hu, Xinyu Chen• 2026

Related benchmarks

Task	Dataset	Result
Robot Manipulation	LIBERO Object	Success Rate99.6	139
Robotic Task Completion	Kinetix	Success Rate91.3	60
Robot Manipulation	LIBERO Spatial	--	41
Robot Manipulation	LIBERO-10	Success Rate96	31
Robot Manipulation	LIBERO Goal	Success Rate97.6	29
Delay-robust robot control	Kinetix	Success Rate (Avg d=0-7)83.3	6

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord