Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies

About

Vision-Language-Action (VLA) models adapt large vision-language backbones to map images and instructions into robot actions. However, prevailing VLAs either generate actions autoregressively in a fixed left-to-right order with poor performance or attach separate diffusion heads outside the backbone that fragments information pathways and hinders unified, scalable architectures. Instead, we present Discrete Diffusion VLA that discretizes action chunks and models them with discrete diffusion pattern retaining progressive refinement inside the unified transformer backbone. Our method achieves an adaptive decoding order that resolves high-confidence action elements before harder ones and employs secondary re-masking to revisit uncertain predictions, enabling robust error correction. This design preserves pretrained vision-language priors, supports parallel decoding, and improves the efficiency. Discrete Diffusion VLA achieves 96.4% avg. success on LIBERO, 71.2% visual matching on SimplerEnv-Fractal, and 54.2% overall on SimplerEnv-Bridge. On out-of-distribution tests of LIBERO-Goal, our method exhibits only 0.8% language degradation versus 8.0% of parallel decoding, and 20.4% vision degradation versus 29.0% for continuous diffusion, demonstrating well retention of pretrained vision-language capabilities. We also conduct two real-robot evaluations on AgileX Cobot Magic platform to show the method's effectiveness.

Zhixuan Liang, Yizhuo Li, Tianshuo Yang, Chengyue Wu, Sitong Mao, Liuao Pei, Tian Nian, Shunbo Zhou, Xiaokang Yang, Jiangmiao Pang, Yao Mu, Ping Luo• 2025

Related benchmarks

TaskDatasetResultRank
Robot ManipulationLIBERO
Object Achievement98.6
957
Robotic ManipulationLIBERO
Spatial Success Rate97.2
527
Robot ManipulationLIBERO (test)
Average Success Rate96.3
220
Robot ManipulationLIBERO Object
Success Rate96.6
127
Robot ManipulationLIBERO
Spatial Success Rate97.2
116
Robot ManipulationSimplerEnv WidowX
Success Rate: Put Spoon on Towel37.5
98
Robot ManipulationSimplerEnv Google Robot tasks Variant Aggregation
Average Success Rate39.8
88
Robotic ManipulationLIBERO v1 (test)
Average Success Rate96.3
83
Robot ManipulationSimplerEnv WidowX Robot tasks (test)
Success Rate (Spoon)29.2
79
Robot ManipulationSimplerEnv Google Robot tasks Visual Matching
Pick Coke Can Success Rate16.3
62
Showing 10 of 34 rows

Other info

Follow for update