From None to All: Self-Supervised 3D Reconstruction via Novel View Synthesis

About

In this paper, we introduce NAS3R, a self-supervised feed-forward framework that jointly learns explicit 3D geometry and camera parameters with no ground-truth annotations and no pretrained priors. During training, NAS3R reconstructs 3D Gaussians from uncalibrated and unposed context views and renders target views using its self-predicted camera parameters, enabling self-supervised training from 2D photometric supervision. To ensure stable convergence, NAS3R integrates reconstruction and camera prediction within a shared transformer backbone regulated by masked attention, and adopts a depth-based Gaussian formulation that facilitates well-conditioned optimization. The framework is compatible with state-of-the-art supervised 3D reconstruction architectures and can incorporate pretrained priors or intrinsic information when available. Extensive experiments show that NAS3R achieves superior results to other self-supervised methods, establishing a scalable and geometry-aware paradigm for 3D reconstruction from unconstrained data. Code and models are publicly available at https://ranrhuang.github.io/nas3r/.

Ranran Huang, Weixun Luo, Ye Mao, Krystian Mikolajczyk• 2026

Related benchmarks

Task	Dataset	Result
Novel View Synthesis	RE10K	SSIM86.1	161
Novel View Synthesis	DTU	PSNR15.511	115
Novel View Synthesis	DL3DV	PSNR20.069	84
Novel View Synthesis	ACID	PSNR26.663	71
Pose Estimation	RE10K	AUC @ 5°0.683	35
Pose Estimation	ACID	AUC @ 5°44	23
Multi-view Depth Estimation	BlendedMVS	AbsRel0.206	18
Two-view Pose Estimation	RE10K	Rotation AUC (10°)69.9	4
Two-view Pose Estimation	ACID	Rotation AUC (10°)66	4
Two-view Pose Estimation	DL3DV	Rotation AUC (10°)38.5	4

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord