Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Speed3R: Sparse Feed-forward 3D Reconstruction Models

About

While recent feed-forward 3D reconstruction models accelerate 3D reconstruction by jointly inferring dense geometry and camera poses in a single pass, their reliance on dense attention imposes a quadratic complexity, creating a prohibitive computational bottleneck that severely limits inference speed. To resolve this, we introduce Speed3R, an end-to-end trainable model inspired by the core principle of Structure-from-Motion: that a sparse set of keypoints is sufficient for robust pose estimation. Speed3R features a dual-branch attention mechanism where a compression branch creates a coarse contextual prior to guide a selection branch, which performs fine-grained attention only on the most informative image tokens. This strategy mimics the efficiency of traditional keypoint matching, achieving a remarkable 12.4x inference speedup on 1000-view sequences, while introducing a minimal, controlled trade-off in geometric accuracy. Validated on standard benchmarks with both VGGT and $\pi^3$ backbones, our method delivers high-quality reconstructions at a fraction of computational cost, paving the way for efficient large-scale scene modeling.

Weining Ren, Xiao Tan, Kai Han• 2026

Related benchmarks

TaskDatasetResultRank
Camera pose estimationTUM-dynamic
ATE0.0193
205
Point Map Estimation7 Scenes
Accuracy (Mean)1.2
69
Relative Pose EstimationScanNet 1500 pairs (test)
AUC@5°37.02
56
Camera pose estimationRealEstate10K
AUC@3074.81
46
Pose EstimationRE10K--
35
Point Map EstimationNRGBD
Mean Accuracy0.0208
32
Pose EstimationCO3D v2
AUC@3089.41
19
Point Map EstimationDTU (test)
Accuracy (Mean)1.175
15
Camera pose estimation7 Scenes
ATE0.0591
14
Camera pose estimationNeural RGB-D
ATE0.0391
14
Showing 10 of 13 rows

Other info

Follow for update