DeepThinkVLA: Enhancing Reasoning Capability of Vision-Language-Action Models
About
Does Chain-of-Thought (CoT) reasoning genuinely improve Vision-Language-Action (VLA) models, or does it merely add overhead? Existing CoT-VLA systems report limited and inconsistent gains, yet no prior work has rigorously diagnosed when and why CoT helps robots act. Through systematic experiments, we identify two necessary conditions that must be jointly satisfied for CoT to be effective in VLA: (1) Decoding Alignment -- CoT and actions must be generated with modality-appropriate mechanisms; forcing both through a single autoregressive decoder is not merely suboptimal but actively harmful, degrading performance by 4.2 percentage points; (2) Causal Alignment -- CoT must be causally linked to task success via outcome-based optimization; without it, supervised CoT is indistinguishable from no reasoning at all under distribution shift, exhibiting a 32.0\,pp performance drop nearly identical to the 31.6\,pp drop of a reasoning-free baseline. Guided by these findings, we build DeepThinkVLA: a hybrid-attention decoder satisfies Condition~1 by pairing causal attention for language with bidirectional attention for parallel action decoding, while a two-stage SFT-then-RL pipeline satisfies Condition~2 by aligning the full reasoning--action chain with sparse task-success rewards. DeepThinkVLA achieves 97.0\% success on LIBERO, 79.0\% robustness on LIBERO-Plus (vs.\ 61.6\% for $\pi_0$-FAST), and 59.3\% success on RoboTwin~2.0, exceeding the strongest baseline by 21.7 points. Furthermore, we validate the practical effectiveness of our approach through real-world robot experiments. Code available at https://github.com/OpenBMB/DeepThinkVLA
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Robot Manipulation | LIBERO | Object Achievement99 | 957 | |
| Robotic Manipulation | LIBERO | Spatial Success Rate96.6 | 527 | |
| Robotic Manipulation | LIBERO-Plus | Language Understanding Score84.5 | 249 | |
| Robot Manipulation | LIBERO (test) | Average Success Rate97 | 220 | |
| Robotic Manipulation | RoboTwin 2.0 | Average Success Rate59.3 | 100 | |
| Robotic Manipulation | LIBERO Long | Success Rate95.02 | 91 | |
| Dual-arm manipulation | RoboTwin Short Horizon Tasks 100-130 Steps 2.0 | Lift Pot Success Rate62 | 20 | |
| Dual-arm manipulation | RoboTwin Medium Horizon Tasks 150-230 Steps 2.0 | Move Can Pot52 | 20 | |
| Dual-arm manipulation | RoboTwin Long & Extra Long Horizon Tasks 280-650 Steps 2.0 | Handover Block43 | 20 | |
| Robotic Manipulation | UT Austin MUTEX | Success Rate (%)26.09 | 8 |