Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AsyncVLA: Asynchronous Flow Matching for Vision-Language-Action Models

About

Vision-language-action (VLA) models have recently emerged as a powerful paradigm for building generalist robots. However, traditional VLA models that generate actions through flow matching (FM) typically rely on rigid and uniform time schedules, i.e., synchronous FM (SFM). Without action context awareness and asynchronous self-correction, SFM becomes unstable in long-horizon tasks, where a single action error can cascade into failure. In this work, we propose asynchronous flow matching VLA (AsyncVLA), a novel framework that introduces temporal flexibility in asynchronous FM (AFM) and enables self-correction in action generation. AsyncVLA breaks from the vanilla SFM in VLA models by generating the action tokens in a non-uniform time schedule with action context awareness. Besides, our method introduces the confidence rater to extract confidence of the initially generated actions, enabling the model to selectively refine inaccurate action tokens before execution. Moreover, we propose a unified training procedure for SFM and AFM that endows a single model with both modes, improving KV-cache utilization. Extensive experiments on robotic manipulation benchmarks demonstrate that AsyncVLA is data-efficient and exhibits self-correction ability. AsyncVLA outperforms existing methods across both simulation and real-world evaluations. Our code is available at https://github.com/YuhuaJiang2002/AsyncVLA.

Yuhua Jiang, Shuang Cheng, Yan Ding, Feifei Gao, Biqing Qi• 2025

Related benchmarks

TaskDatasetResultRank
Robot ManipulationLIBERO (test)
Average Success Rate97.4
220
Robot ManipulationSimplerEnv WidowX
Success Rate: Put Spoon on Towel70.8
98
Robot ManipulationGoogle Robot variant evaluation SimplerEnv
Pick Coke Success Rate98
24
Robot ManipulationAgileX PiPER real-world
Success Rate (Carrot -> Bowl)94
4
Showing 4 of 4 rows

Other info

Follow for update