NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction

About

We present NOVA3R, an effective approach for non-pixel-aligned 3D reconstruction from a set of unposed images in a feed-forward manner. Unlike pixel-aligned methods that tie geometry to per-ray predictions, our formulation learns a global, view-agnostic scene representation that decouples reconstruction from pixel alignment. This addresses two key limitations in pixel-aligned 3D: (1) it recovers both visible and invisible points with a complete scene representation, and (2) it produces physically plausible geometry with fewer duplicated structures in overlapping regions. To achieve this, we introduce a scene-token mechanism that aggregates information across unposed images and a diffusion-based 3D decoder that reconstructs complete, non-pixel-aligned point clouds. Extensive experiments on both scene-level and object-level datasets demonstrate that NOVA3R outperforms state-of-the-art methods in terms of reconstruction accuracy and completeness.

Weirong Chen, Chuanxia Zheng, Ganlin Zhang, Andrea Vedaldi, Daniel Cremers• 2026

Related benchmarks

Task	Dataset	Result
3D Scene Reconstruction	7-Scenes (test)	--	34
Scene Completion	SCRREAM Complete	CD4.8	15
3D Reconstruction	SCRREAM Occluded	Chamfer Distance (CD)3.56	10
3D Reconstruction	SCRREAM Complete	CD3.43	10
Scene Completion	SCRREAM Visible	CD0.043	10
Surface Reconstruction	DTU	Chamfer Distance (CD)0.0307	10
3D Reconstruction	SCRREAM Visible	CD3.2	10
Surface Reconstruction	BlendedMVS	Chamfer Distance (CD)0.0413	10
3D Scene Reconstruction	Tanks & Temples (out-of-distribution)	CD0.0432	10
3D Scene Reconstruction	Mip-NeRF 360 (out-of-distribution)	Chamfer Distance (CD)0.0429	10

Showing 10 of 22 rows

Other info

Follow for update

@wizwand_team Discord