ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

About

Vision-language-action (VLA) reasoning tasks require agents to interpret multimodal instructions, perform long-horizon planning, and act adaptively in dynamic environments. Existing approaches typically train VLA models in an end-to-end fashion, directly mapping inputs to actions without explicit reasoning, which hinders their ability to plan over multiple steps or adapt to complex task variations. In this paper, we propose ThinkAct, a dual-system framework that bridges high-level reasoning with low-level action execution via reinforced visual latent planning. ThinkAct trains a multimodal LLM to generate embodied reasoning plans guided by reinforcing action-aligned visual rewards based on goal completion and trajectory consistency. These reasoning plans are compressed into a visual plan latent that conditions a downstream action model for robust action execution on target environments. Extensive experiments on embodied reasoning and robot manipulation benchmarks demonstrate that ThinkAct enables few-shot adaptation, long-horizon planning, and self-correction behaviors in complex embodied AI tasks.

Chi-Pin Huang, Yueh-Hua Wu, Min-Hung Chen, Yu-Chiang Frank Wang, Fu-En Yang• 2025

Related benchmarks

Task	Dataset	Result
Robot Manipulation	LIBERO	Object Achievement91.4	1025
Robotic Manipulation	LIBERO	Spatial Success Rate88.3	570
Robot Manipulation	LIBERO (test)	Average Success Rate84.4	237
Robotic Manipulation	LIBERO	Long-horizon Success Rate70.9	165
Robot Manipulation	SimplerEnv WidowX	Overall Success Rate43.8	123
Robotic Manipulation	LIBERO v1 (test)	Average Success Rate84.4	118
Robotic Manipulation	LIBERO	Long Success Rate70.9	108
Robotic Manipulation	LIBERO (test)	Object Success Rate91.4	85
Robot Manipulation	SimplerEnv Google Robot Visual Matching	Pick Coke Can92	79
Robot Manipulation	SimplerEnv WidowX Robot tasks (test)	Success Rate (Spoon)58.3	79

Showing 10 of 52 rows

Other info

Follow for update

@wizwand_team Discord