Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

From None to All: Self-Supervised 3D Reconstruction via Novel View Synthesis

About

In this paper, we introduce NAS3R, a self-supervised feed-forward framework that jointly learns explicit 3D geometry and camera parameters with no ground-truth annotations and no pretrained priors. During training, NAS3R reconstructs 3D Gaussians from uncalibrated and unposed context views and renders target views using its self-predicted camera parameters, enabling self-supervised training from 2D photometric supervision. To ensure stable convergence, NAS3R integrates reconstruction and camera prediction within a shared transformer backbone regulated by masked attention, and adopts a depth-based Gaussian formulation that facilitates well-conditioned optimization. The framework is compatible with state-of-the-art supervised 3D reconstruction architectures and can incorporate pretrained priors or intrinsic information when available. Extensive experiments show that NAS3R achieves superior results to other self-supervised methods, establishing a scalable and geometry-aware paradigm for 3D reconstruction from unconstrained data. Code and models are publicly available at https://ranrhuang.github.io/nas3r/.

Ranran Huang, Weixun Luo, Ye Mao, Krystian Mikolajczyk• 2026

Related benchmarks

TaskDatasetResultRank
Novel View SynthesisRE10K
SSIM86.1
142
Novel View SynthesisDTU
PSNR15.511
115
Novel View SynthesisDL3DV
PSNR20.069
84
Novel View SynthesisACID
PSNR26.663
71
Pose EstimationRE10K
AUC @ 5°0.683
35
Pose EstimationACID
AUC @ 5°44
23
Multi-view Depth EstimationBlendedMVS
AbsRel0.206
18
Two-view Pose EstimationRE10K
Rotation AUC (10°)69.9
4
Two-view Pose EstimationACID
Rotation AUC (10°)66
4
Two-view Pose EstimationDL3DV
Rotation AUC (10°)38.5
4
Showing 10 of 12 rows

Other info

Follow for update