Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning

About

Vision-Language-Action (VLA) tasks require reasoning over complex visual scenes and executing adaptive actions in dynamic environments. While recent studies on reasoning VLAs show that explicit chain-of-thought (CoT) can improve generalization, they suffer from high inference latency due to lengthy reasoning traces. We propose Fast-ThinkAct, an efficient reasoning framework that achieves compact yet performant planning through verbalizable latent reasoning. Fast-ThinkAct learns to reason efficiently with latent CoTs by distilling from a teacher, driven by a preference-guided objective to align manipulation trajectories that transfers both linguistic and visual planning capabilities for embodied control. This enables reasoning-enhanced policy learning that effectively connects compact reasoning to action execution. Extensive experiments across diverse embodied manipulation and reasoning benchmarks demonstrate that Fast-ThinkAct achieves strong performance with up to 89.3% reduced inference latency over state-of-the-art reasoning VLAs, while maintaining effective long-horizon planning, few-shot adaptation, and failure recovery.

Chi-Pin Huang, Yunze Man, Zhiding Yu, Min-Hung Chen, Jan Kautz, Yu-Chiang Frank Wang, Fu-En Yang• 2026

Related benchmarks

Task	Dataset	Result
Robot Manipulation	LIBERO	--	1025
Robotic Manipulation	LIBERO	Spatial Success Rate92	570
Robot Manipulation	LIBERO (test)	Average Success Rate89.7	237
Egocentric daily-task planning	EgoPlanBench2	Overall Success Rate47.5	44
Long-horizon reasoning for robotic manipulation	RoboVQA	B-1 Score70.1	28
Embodied Question Answering	OpenEQA	--	26
Robotic Manipulation	LIBERO five suites	Spatial Success92	15
Embodied Question Answering	RoboVQA	BLEU-170.4	13
Zero-shot understanding of embodied scenes	OpenEQA	Score51.2	10
Bimanual Robot Manipulation	RoboTwin Easy Setting 2.0 (test)	Click Alarm70	6

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord