Think3D: Thinking with Space for Spatial Reasoning

About

While contemporary Vision-Language Models (VLMs) excel at 2D visual understanding, they remain constrained by a passive, 2D-centric paradigm that severely limits genuine 3D spatial reasoning. To bridge this gap, we introduce Think3D, a novel framework that equips VLM agents with interactive, 3D chain-of-thought reasoning capabilities. By integrating a suite of 3D manipulation tools, Think3D transforms passive perception into active spatial exploration, closely mirroring human geometric reasoning. We demonstrate that Think3D acts as a highly effective zero-shot plug-in for state-of-the-art closed-source models (e.g., GPT-4.1, Gemini 2.5 Pro), yielding absolute performance gains of +7.8% on BLINK Multi-view and MindCube, and +4.7% on VSI-Bench. Furthermore, to optimize tool-use in smaller open-weight models, we propose Think3D-RL, a reinforcement learning paradigm designed to autonomously learn spatial exploration strategies. When applied to Qwen3-VL-4B, Think3D-RL amplifies the performance gain from a marginal +0.7% to a substantial +10.7%. Notably, this RL formulation induces an exploration policy that qualitatively aligns with the sophisticated behavior of much larger models, entirely circumventing the need for costly operation-trajectory annotations. Ultimately, Think3D establishes tool-augmented active exploration as an effective paradigm for unlocking human-like 3D reasoning in multimodal agents. Code, models, and data are available at https://github.com/zhangzaibin/spagent.

Zaibin Zhang, Yuhan Wu, Lianjie Jia, Yifan Wang, Zhongbo Zhang, Yijiang Li, Binghao Ran, Fuxi Zhang, Zhuohan Sun, Zhenfei Yin, Lijun Wang, Huchuan Lu• 2026

Related benchmarks

Task	Dataset	Result
Spatial Reasoning	VSI-Bench	R.Dr.61.8	370
Multi-view spatial reasoning	MindCube (tiny)	Overall Accuracy41.7	84
Spatial Reasoning	MMSI-Bench	--	67
Spatial Reasoning	BLINK	--	57
Spatial Reasoning	VSI-Bench tiny	Avg Score51.61	39
Spatial Reasoning	CV-3D	Depth Order93	24
Spatial Reasoning	VSI-Bench tiny	Pass@1 Accuracy (Rel Dist)44.7	19
Multi-view Reasoning	BLINK	MV Score78.3	16
Multi-view Multimodal Reasoning	BLINK MV	Accuracy53.4	16
Spatial Reasoning	BLINK Multi-view (test)	Accuracy63.91	15

Showing 10 of 18 rows

Other info

GitHub

Follow for update

@wizwand_team Discord