DeepThinkVLA: Enhancing Reasoning Capability of Vision-Language-Action Models

About

Does Chain-of-Thought (CoT) reasoning genuinely improve Vision-Language-Action (VLA) models, or does it merely add overhead? Existing CoT-VLA systems report limited and inconsistent gains, yet no prior work has rigorously diagnosed when and why CoT helps robots act. Through systematic experiments, we identify two necessary conditions that must be jointly satisfied for CoT to be effective in VLA: (1) Decoding Alignment -- CoT and actions must be generated with modality-appropriate mechanisms; forcing both through a single autoregressive decoder is not merely suboptimal but actively harmful, degrading performance by 4.2 percentage points; (2) Causal Alignment -- CoT must be causally linked to task success via outcome-based optimization; without it, supervised CoT is indistinguishable from no reasoning at all under distribution shift, exhibiting a 32.0\,pp performance drop nearly identical to the 31.6\,pp drop of a reasoning-free baseline. Guided by these findings, we build DeepThinkVLA: a hybrid-attention decoder satisfies Condition~1 by pairing causal attention for language with bidirectional attention for parallel action decoding, while a two-stage SFT-then-RL pipeline satisfies Condition~2 by aligning the full reasoning--action chain with sparse task-success rewards. DeepThinkVLA achieves 97.0\% success on LIBERO, 79.0\% robustness on LIBERO-Plus (vs.\ 61.6\% for $\pi_0$-FAST), and 59.3\% success on RoboTwin~2.0, exceeding the strongest baseline by 21.7 points. Furthermore, we validate the practical effectiveness of our approach through real-world robot experiments. Code available at https://github.com/OpenBMB/DeepThinkVLA

Cheng Yin, Yankai Lin, Wang Xu, Sikyuen Tam, Xiangrui Zeng, Zhiyuan Liu, Zhouping Yin• 2025

Related benchmarks

Task	Dataset	Result
Robot Manipulation	LIBERO	Object Achievement99	957
Robotic Manipulation	LIBERO	Spatial Success Rate96.6	527
Robotic Manipulation	LIBERO-Plus	Language Understanding Score84.5	249
Robot Manipulation	LIBERO (test)	Average Success Rate97	220
Robotic Manipulation	RoboTwin 2.0	Average Success Rate59.3	100
Robotic Manipulation	LIBERO Long	Success Rate95.02	91
Dual-arm manipulation	RoboTwin Short Horizon Tasks 100-130 Steps 2.0	Lift Pot Success Rate62	20
Dual-arm manipulation	RoboTwin Medium Horizon Tasks 150-230 Steps 2.0	Move Can Pot52	20
Dual-arm manipulation	RoboTwin Long & Extra Long Horizon Tasks 280-650 Steps 2.0	Handover Block43	20
Robotic Manipulation	UT Austin MUTEX	Success Rate (%)26.09	8

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord