Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

InstantSfM: Towards GPU-Native SfM for the Deep Learning Era

About

Structure-from-Motion (SfM) is a fundamental technique for recovering camera poses and scene structure from multi-view imagery, serving as a critical upstream component for applications ranging from 3D reconstruction to modern neural scene representations such as 3D Gaussian Splatting. However, most mature SfM systems remain CPU-centric and built upon traditional optimization toolchains, creating a growing mismatch with modern GPU-based, learning-driven pipelines and limiting scalability in large-scale scenes. While recent advances in GPU-accelerated bundle adjustment (BA) have demonstrated the potential of parallel sparse optimization, extending these techniques to build a complete global SfM system remains challenging due to unresolved issues in metric scale recovery and numerical robustness. In this paper, we implement a fully GPU-based and PyTorch-compatible global SfM system, named InstantSfM, to integrate seamlessly with modern learning pipelines. InstantSfM embeds metric depth priors directly into both global positioning and BA through a depth-constrained Jacobian structure, thereby resolving scale ambiguity within the optimization framework. To ensure numerical stability, we employ explicit filtering of under-constrained variables for the Jacobian matrix in an optimized GPU-friendly manner. Extensive experiments on diverse datasets demonstrate that InstantSfM achieves state-of-the-art efficiency while maintaining reconstruction accuracy comparable to both established classical pipelines and recent learning-based methods, showing up to ${\sim40\times}$ speedup over COLMAP on large-scale scenes.

Jiankun Zhong, Zitong Zhan, Quankai Gao, Ziyu Chen, Haozhe Lou, Jiageng Mao, Ulrich Neumann, Chen Wang, Yue Wang• 2025

Related benchmarks

TaskDatasetResultRank
Novel View SynthesisMip-NeRF360
PSNR28.43
138
Structure-from-MotionDTU
PSNR30.83
30
Novel View SynthesisMip-NeRF 360 garden
SSIM0.869
14
Novel View SynthesisMip-NeRF 360 stump
SSIM0.711
14
Camera pose estimation7-Scenes (500 Images)
RRA@30100
13
Novel View SynthesisMipNeRF360 Room
PSNR31.04
12
Novel View SynthesisMip-NeRF 360 Synthesized Varying Exposure (bicycle)
PSNR25.73
9
Novel View SynthesisMip-NeRF360 bonsai
PSNR32.06
7
Novel View SynthesisMip-NeRF360 counter
PSNR29.23
7
Novel View SynthesisMip-NeRF360 kitchen
PSNR27.79
7
Showing 10 of 21 rows

Other info

Follow for update