Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Uni3R: Unified 3D Reconstruction and Semantic Understanding via Generalizable Gaussian Splatting from Unposed Multi-View Images

About

Reconstructing and semantically interpreting 3D scenes from sparse 2D views remains a fundamental challenge in computer vision. Conventional methods often decouple semantic understanding from reconstruction or necessitate costly per-scene optimization, thereby restricting their scalability and generalizability. In this paper, we introduce Uni3R, a novel feed-forward framework that jointly reconstructs a unified 3D scene representation enriched with open-vocabulary semantics, directly from unposed multi-view images. Our approach leverages a Cross-View Transformer to robustly integrate information across arbitrary multi-view inputs, which then regresses a set of 3D Gaussian primitives endowed with semantic feature fields. This unified representation facilitates high-fidelity novel view synthesis, open-vocabulary 3D semantic segmentation, and depth prediction, all within a single, feed-forward pass. Extensive experiments demonstrate that Uni3R establishes a new state-of-the-art across multiple benchmarks, including 25.07 PSNR on RE10K and 55.84 mIoU on ScanNet. Our work signifies a novel paradigm towards generalizable, unified 3D scene reconstruction and understanding. The code is available at https://github.com/HorizonRobotics/Uni3R.

Xiangyu Sun, Haoyi Jiang, Liu Liu, Seungtae Nam, Gyeongjin Kang, Xinjie Wang, Wei Sui, Zhizhong Su, Wenyu Liu, Xinggang Wang, Eunbyung Park• 2025

Related benchmarks

TaskDatasetResultRank
Novel View SynthesisScanNet
PSNR24.122
130
Novel View SynthesisDTU
PSNR18.256
115
Depth EstimationScanNet
AbsRel4.46
108
Novel View SynthesisReplica
PSNR23.888
69
Novel View SynthesisScanNet++
PSNR22.221
67
Novel View SynthesisMip-NeRF360 (test)
PSNR17.331
62
3D Semantic SegmentationScanNet
mIoU52.2
51
Novel View SynthesisRE10K 8 views
PSNR26.629
22
Semantic segmentationScanNet short-sequence
mIoU33.75
21
Novel View SynthesisScanNet unseen real scenes
PSNR25.37
18
Showing 10 of 34 rows

Other info

Follow for update