Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DFM-VLA: Iterative Action Refinement for Robot Manipulation via Discrete Flow Matching

About

Vision--Language--Action (VLA) models that encode actions using a discrete tokenization scheme are increasingly adopted for robotic manipulation, but existing decoding paradigms remain fundamentally limited. Whether actions are decoded sequentially by autoregressive VLAs or in parallel by discrete diffusion VLAs, once a token is generated, it is typically fixed and cannot be revised in subsequent iterations, so early token errors cannot be effectively corrected later. We propose DFM-VLA, a discrete flow matching VLA for iterative refinement of action tokens. DFM-VLA~models a token-level probability velocity field that dynamically updates the full action sequence across refinement iterations. We investigate two ways to construct the velocity field: an auxiliary velocity-head formulation and an action-embedding-guided formulation. Our framework further adopts a two-stage decoding strategy with an iterative refinement stage followed by deterministic validation for stable convergence. Extensive experiments on CALVIN, LIBERO, and real-world manipulation tasks show that DFM-VLA consistently outperforms strong autoregressive, discrete diffusion, and continuous diffusion baselines in manipulation performance while retaining high inference efficiency. In particular, DFM-VLA achieves an average success length of 4.44 on CALVIN and an average success rate of 95.7\% on LIBERO, highlighting the value of action refinement via discrete flow matching for robotic manipulation. Our project is available https://chris1220313648.github.io/DFM-VLA/

Jiayi Chen, Wenxuan Song, Shuai Chen, Jingbo Wang, Zhijun Li, Haoang Li• 2026

Related benchmarks

TaskDatasetResultRank
Robotic ManipulationCalvin ABCD→D
Avg Length4.42
89
Robotic ManipulationLIBERO v1 (test)
Average Success Rate95.7
46
Bimanual collaborative liftingReal-world Pot Lift
Success Rate77.5
5
Grasping and placing blocks with varying heightReal-world Place Block to Plate
Success Rate65
5
Grasping and placing elongated vegetablesReal-world Place Veg. to Pot
Success Rate70
5
Showing 5 of 5 rows

Other info

Follow for update