Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Thinking with Geometry: Active Geometry Integration for Spatial Reasoning

About

Recent progress in spatial reasoning with Multimodal Large Language Models (MLLMs) increasingly leverages geometric priors from 3D encoders. However, most existing integration strategies remain passive: geometry is exposed as a global stream and fused in an indiscriminate manner, which often induces semantic-geometry misalignment and redundant signals. We propose GeoThinker, a framework that shifts the paradigm from passive fusion to active perception. Instead of feature mixing, GeoThinker enables the model to selectively retrieve geometric evidence conditioned on its internal reasoning demands. GeoThinker achieves this through Spatial-Grounded Fusion applied at carefully selected VLM layers, where semantic visual priors selectively query and integrate task-relevant geometry via frame-strict cross-attention, further calibrated by Importance Gating that biases per-frame attention toward task-relevant structures. Comprehensive evaluation results show that GeoThinker sets a new state-of-the-art in spatial intelligence, achieving a peak score of 72.6 on the VSI-Bench. Furthermore, GeoThinker demonstrates robust generalization and significantly improved spatial perception across complex downstream scenarios, including embodied referring and autonomous driving. Our results indicate that the ability to actively integrate spatial structures is essential for next-generation spatial intelligence. Code can be found at https://github.com/Li-Hao-yuan/GeoThinker.

Haoyuan Li, Qihang Cao, Tao Tang, Kun Xiang, Zihan Guo, Jianhua Han, Hang Xu, Xiaodan Liang• 2026

Related benchmarks

TaskDatasetResultRank
Autonomous Driving PlanningNAVSIM (navtest)
NC97
50
Spatial ReasoningCV-Bench
Accuracy85.1
46
Spatial ReasoningMindCube
Accuracy83.6
37
Multimodal Spatial IntelligenceEASI (In-Domain)
Average Score55
32
Spatial ReasoningViewspatial
Accuracy45.9
28
Spatial ReasoningVSI-Bench
Accuracy72.6
24
Spatial ReasoningSITE
Accuracy55.9
24
Spatial ReasoningMMSI-Bench
Accuracy31.7
24
Visual Spatial Intelligence ReasoningVSI
Accuracy73.4
20
Visual Spatial Intelligence ReasoningVSI-Debiased
Accuracy0.681
20
Showing 10 of 16 rows

Other info

Follow for update