Uni3R: Unified 3D Reconstruction and Semantic Understanding via Generalizable Gaussian Splatting from Unposed Multi-View Images
About
Reconstructing and semantically interpreting 3D scenes from sparse 2D views remains a fundamental challenge in computer vision. Conventional methods often decouple semantic understanding from reconstruction or necessitate costly per-scene optimization, thereby restricting their scalability and generalizability. In this paper, we introduce Uni3R, a novel feed-forward framework that jointly reconstructs a unified 3D scene representation enriched with open-vocabulary semantics, directly from unposed multi-view images. Our approach leverages a Cross-View Transformer to robustly integrate information across arbitrary multi-view inputs, which then regresses a set of 3D Gaussian primitives endowed with semantic feature fields. This unified representation facilitates high-fidelity novel view synthesis, open-vocabulary 3D semantic segmentation, and depth prediction, all within a single, feed-forward pass. Extensive experiments demonstrate that Uni3R establishes a new state-of-the-art across multiple benchmarks, including 25.07 PSNR on RE10K and 55.84 mIoU on ScanNet. Our work signifies a novel paradigm towards generalizable, unified 3D scene reconstruction and understanding. The code is available at https://github.com/HorizonRobotics/Uni3R.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Novel View Synthesis | ScanNet | PSNR24.122 | 130 | |
| Novel View Synthesis | DTU | PSNR18.256 | 115 | |
| Depth Estimation | ScanNet | AbsRel4.46 | 108 | |
| Novel View Synthesis | Replica | PSNR23.888 | 69 | |
| Novel View Synthesis | ScanNet++ | PSNR22.221 | 67 | |
| Novel View Synthesis | Mip-NeRF360 (test) | PSNR17.331 | 62 | |
| 3D Semantic Segmentation | ScanNet | mIoU52.2 | 51 | |
| Novel View Synthesis | RE10K 8 views | PSNR26.629 | 22 | |
| Semantic segmentation | ScanNet short-sequence | mIoU33.75 | 21 | |
| Novel View Synthesis | ScanNet unseen real scenes | PSNR25.37 | 18 |