Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long-tail Internet photo reconstruction

About

Internet photo collections exhibit an extremely long-tailed distribution: a few famous landmarks are densely photographed and easily reconstructed in 3D, while most real-world sites are represented with sparse, noisy, uneven imagery beyond the capabilities of both classical and learned 3D methods. We believe that tackling this long-tail regime represents one of the next frontiers for 3D foundation models. Although reliable ground-truth 3D supervision from sparse scenes is challenging to acquire, we observe that it can be effectively simulated by sampling sparse subsets from well-reconstructed Internet landmarks. To this end, we introduce MegaDepth-X, a large dataset of 3D reconstructions with clean, dense depth, together with a strategy for sampling sets of training images that mimic camera distributions in long-tail scenes. Finetuning 3D foundation models with these components yields robust reconstructions under extreme sparsity, and also enables more reliable reconstruction in symmetric and repetitive scenes, while preserving generalization to standard, dense 3D benchmark datasets.

Yuan Li, Yuanbo Xiangli, Hadar Averbuch-Elor, Noah Snavely, Ruojin Cai• 2026

Related benchmarks

TaskDatasetResultRank
Point Map Estimation7 Scenes
Accuracy (Mean)6.2
69
Point Map EstimationETH3D
NC Mean0.861
50
Point Map EstimationDTU
Accuracy (Mean)1.202
42
Point Map EstimationNRGBD
Mean Accuracy0.071
32
Camera pose estimationMegaDepth-X easy
RRA@595.64
4
Camera pose estimationMegaDepth-X (hard)
RRA@586.4
4
Camera pose estimationRealEstate10K
RRA@598.8
4
Camera pose estimationCO3D v2
RRA@597.11
4
Point Map EstimationMegaDepth-X easy
Accuracy (Mean)5
4
Point Map EstimationMegaDepth-X (hard)
Accuracy (Mean)8.9
4
Showing 10 of 10 rows

Other info

Follow for update