Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models

About

Spatial reasoning remains a fundamental challenge for Vision-Language Models (VLMs), with current approaches struggling to achieve robust performance despite recent advances. We identify that this limitation stems from a critical gap: existing methods attempt to learn spatial reasoning directly without establishing the hierarchical foundations of perception and understanding. To address this challenge, we present a comprehensive methodology for building spatial intelligence progressively. We introduce SpatialLadder-26k, a multimodal dataset containing 26,610 samples spanning object localization, single image, multi-view, and video spatial reasoning tasks, constructed through a standardized pipeline that ensures systematic coverage across modalities. Building on this dataset, we design a three-stage progressive training framework that (1) establishes spatial perception through object localization, (2) develops spatial understanding through multi-dimensional spatial tasks, and (3) strengthens complex reasoning via reinforcement learning with verifiable rewards. This approach yields SpatialLadder, a 3B-parameter model that achieves state-of-the-art performance on spatial reasoning benchmarks, with 23.4% average improvement over the base model, surpassing GPT-4o by 20.8% and Gemini-2.0-Flash by 10.1%. Notably, SpatialLadder maintains strong generalization with 7.2% improvement on out-of-domain benchmarks, demonstrating that progressive training from perception to reasoning is essential for robust spatial intelligence.

Hongxing Li, Dingming Li, Zixuan Wang, Yuchen Yan, Hang Wu, Wenqi Zhang, Yongliang Shen, Weiming Lu, Jun Xiao, Yueting Zhuang• 2025

Related benchmarks

TaskDatasetResultRank
Spatial ReasoningCV-Bench
Accuracy73.7
46
Spatial ReasoningMindCube
Accuracy43.4
37
Multimodal Spatial IntelligenceEASI (In-Domain)
Average Score39.1
32
Spatial ReasoningMindCube tiny (test)
Rot. Accuracy30.5
30
Spatial ReasoningMMSI-Bench (test)
PR Score30.3
29
Spatial ReasoningViewspatial
Accuracy44.2
28
Spatial ReasoningVSI-Bench
Accuracy44.8
24
Spatial ReasoningMMSI-Bench
Accuracy27.4
24
Spatial ReasoningSITE
Accuracy27.9
24
Spatial ReasoningCV-Bench-3D
Accuracy74.9
21
Showing 10 of 23 rows

Other info

Follow for update