Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DeepThinkVLA: Enhancing Reasoning Capability of Vision-Language-Action Models

About

Does Chain-of-Thought (CoT) reasoning genuinely improve Vision-Language-Action (VLA) models, or does it merely add overhead? Existing CoT-VLA systems report limited and inconsistent gains, yet no prior work has rigorously diagnosed when and why CoT helps robots act. Through systematic experiments, we identify two necessary conditions that must be jointly satisfied for CoT to be effective in VLA: (1) Decoding Alignment -- CoT and actions must be generated with modality-appropriate mechanisms; forcing both through a single autoregressive decoder is not merely suboptimal but actively harmful, degrading performance by 4.2 percentage points; (2) Causal Alignment -- CoT must be causally linked to task success via outcome-based optimization; without it, supervised CoT is indistinguishable from no reasoning at all under distribution shift, exhibiting a 32.0\,pp performance drop nearly identical to the 31.6\,pp drop of a reasoning-free baseline. Guided by these findings, we build DeepThinkVLA: a hybrid-attention decoder satisfies Condition~1 by pairing causal attention for language with bidirectional attention for parallel action decoding, while a two-stage SFT-then-RL pipeline satisfies Condition~2 by aligning the full reasoning--action chain with sparse task-success rewards. DeepThinkVLA achieves 97.0\% success on LIBERO, 79.0\% robustness on LIBERO-Plus (vs.\ 61.6\% for $\pi_0$-FAST), and 59.3\% success on RoboTwin~2.0, exceeding the strongest baseline by 21.7 points. Furthermore, we validate the practical effectiveness of our approach through real-world robot experiments. Code available at https://github.com/OpenBMB/DeepThinkVLA

Cheng Yin, Yankai Lin, Wang Xu, Sikyuen Tam, Xiangrui Zeng, Zhiyuan Liu, Zhouping Yin• 2025

Related benchmarks

TaskDatasetResultRank
Robot ManipulationLIBERO
Object Achievement99
957
Robotic ManipulationLIBERO
Spatial Success Rate96.6
527
Robotic ManipulationLIBERO-Plus
Language Understanding Score84.5
249
Robot ManipulationLIBERO (test)
Average Success Rate97
220
Robotic ManipulationRoboTwin 2.0
Average Success Rate59.3
100
Robotic ManipulationLIBERO Long
Success Rate95.02
91
Dual-arm manipulationRoboTwin Short Horizon Tasks 100-130 Steps 2.0
Lift Pot Success Rate62
20
Dual-arm manipulationRoboTwin Medium Horizon Tasks 150-230 Steps 2.0
Move Can Pot52
20
Dual-arm manipulationRoboTwin Long & Extra Long Horizon Tasks 280-650 Steps 2.0
Handover Block43
20
Robotic ManipulationUT Austin MUTEX
Success Rate (%)26.09
8
Showing 10 of 15 rows

Other info

Follow for update