3D Reconstruction with Spatial Memory

About

We present Spann3R, a novel approach for dense 3D reconstruction from ordered or unordered image collections. Built on the DUSt3R paradigm, Spann3R uses a transformer-based architecture to directly regress pointmaps from images without any prior knowledge of the scene or camera parameters. Unlike DUSt3R, which predicts per image-pair pointmaps each expressed in its local coordinate frame, Spann3R can predict per-image pointmaps expressed in a global coordinate system, thus eliminating the need for optimization-based global alignment. The key idea of Spann3R is to manage an external spatial memory that learns to keep track of all previous relevant 3D information. Spann3R then queries this spatial memory to predict the 3D structure of the next frame in a global coordinate system. Taking advantage of DUSt3R's pre-trained weights, and further fine-tuning on a subset of datasets, Spann3R shows competitive performance and generalization ability on various unseen datasets and can process ordered image collections in real time. Project page: \url{https://hengyiwang.github.io/projects/spanner}

Hengyi Wang, Lourdes Agapito• 2024

Related benchmarks

Task	Dataset	Result
Novel View Synthesis	RE10K	SSIM76.1	345
Video Depth Estimation	Sintel	Delta Threshold Accuracy (1.25)50.8	235
Camera pose estimation	TUM-dynamic	ATE0.0421	205
Camera pose estimation	Sintel	ATE0.329	203
Depth Estimation	KITTI	--	184
3D Reconstruction	7 Scenes	Completion20.5	161
Video Depth Estimation	KITTI	Abs Rel0.198	153
Monocular Depth Estimation	Sintel	Abs Rel0.47	142
Video Depth Estimation	BONN	AbsRel14.4	139
Camera pose estimation	ScanNet	RPE (t)0.023	133

Showing 10 of 125 rows

...

Other info

Follow for update

@wizwand_team Discord