Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Scalable Adaptation of 3D Geometric Foundation Models via Weak Supervision from Internet Video

About

Geometric foundation models show promise in 3D reconstruction, yet their progress is severely constrained by the scarcity of diverse, large-scale 3D annotations. While Internet videos offer virtually unlimited raw data, utilizing them as a scaling source for geometric learning is challenging due to the absence of ground-truth geometry and the presence of observational noise. To address this, we propose SAGE, a framework for Scalable Adaptation of GEometric foundation models from raw video streams. SAGE leverages a hierarchical mining pipeline to transform videos into training trajectories and hybrid supervision: (1) Informative training trajectory selection; (2) Sparse Geometric Anchoring via SfM point clouds for global structural guidance; and (3) Dense Differentiable Consistency via 3D Gaussian rendering for multi-view constraints. To prevent catastrophic forgetting, we introduce a regularization strategy using anchor data. Extensive experiments show that SAGE significantly enhances zero-shot generalization, reducing Chamfer Distance by 20-42% on unseen benchmarks (7Scenes, TUM-RGBD, Matterport3D) compared to state-of-the-art baselines. To our knowledge, SAGE pioneers the adaptation of geometric foundation models via Internet video, establishing a scalable paradigm for general-purpose 3D learning.

Zihui Gao, Ke Liu, Donny Y. Chen, Duochao Shi, Guosheng Lin, Hao Chen, Chunhua Shen• 2026

Related benchmarks

TaskDatasetResultRank
Novel View SynthesisScanNet
PSNR20.98
58
3D Scene Reconstruction7-Scenes (test)--
27
Pose EstimationTUM-RGBD--
8
8-view 3D ReconstructionScanNet In-distribution (test)
DAc@0.291.2
7
8-view 3D ReconstructionTUM-RGBD Zero-shot Generalization (test)
DAc@0.293.1
7
8-view 3D ReconstructionMatterport3D Zero-shot Generalization (test)
DAc@0.548
7
3D ReconstructionUASOL (out-of-distribution)
DAc-0.527.8
3
Pose Estimation7Scenes
RRE0.2
2
Pose EstimationMP3D
RRE41.2
2
Showing 9 of 9 rows

Other info

Follow for update