Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction

About

We present NOVA3R, an effective approach for non-pixel-aligned 3D reconstruction from a set of unposed images in a feed-forward manner. Unlike pixel-aligned methods that tie geometry to per-ray predictions, our formulation learns a global, view-agnostic scene representation that decouples reconstruction from pixel alignment. This addresses two key limitations in pixel-aligned 3D: (1) it recovers both visible and invisible points with a complete scene representation, and (2) it produces physically plausible geometry with fewer duplicated structures in overlapping regions. To achieve this, we introduce a scene-token mechanism that aggregates information across unposed images and a diffusion-based 3D decoder that reconstructs complete, non-pixel-aligned point clouds. Extensive experiments on both scene-level and object-level datasets demonstrate that NOVA3R outperforms state-of-the-art methods in terms of reconstruction accuracy and completeness.

Weirong Chen, Chuanxia Zheng, Ganlin Zhang, Andrea Vedaldi, Daniel Cremers• 2026

Related benchmarks

TaskDatasetResultRank
3D Scene Reconstruction7-Scenes (test)--
27
Scene CompletionSCRREAM Complete
CD4.8
15
Scene CompletionSCRREAM Visible
CD0.043
10
Object CompletionGSO K=1 (1030-object)
CD0.02
6
3D ReconstructionSC-REAM Complete, K=1
Hole Ratio8.8
4
3D ReconstructionSC-REAM Complete, K=2
Hole Ratio12.1
4
3D ReconstructionSC-REAM Complete, K=4
Hole Ratio13.4
4
Object CompletionGSO K=2 (3090 image pairs)
CD2.3
2
Showing 8 of 8 rows

Other info

Follow for update