UniSem: Generalizable Semantic 3D Reconstruction from Sparse Unposed Images

About

Semantic-aware 3D reconstruction from sparse, unposed images remains challenging for feed-forward 3D Gaussian Splatting (3DGS). Existing methods often predict an over-complete set of Gaussian primitives under sparse-view supervision, leading to unstable geometry and inferior depth quality. Meanwhile, they rely solely on 2D segmenter features for semantic lifting, which provides weak 3D-level and limited generalizable supervision, resulting in incomplete 3D semantics in novel scenes. To address these issues, we propose UniSem, a unified framework that jointly improves depth accuracy and semantic generalization via two key components. First, Error-aware Gaussian Dropout (EGD) performs error-guided capacity control by suppressing redundancy-prone Gaussians using rendering error cues, producing meaningful, geometrically stable Gaussian representations for improved depth estimation. Second, we introduce a Mix-training Curriculum (MTC) that progressively blends 2D segmenter-lifted semantics with the model's own emergent 3D semantic priors, implemented with object-level prototype alignment to enhance semantic coherence and completeness. Extensive experiments on ScanNet and Replica show that UniSem achieves superior performance in depth prediction and open-vocabulary 3D segmentation across varying numbers of input views. Notably, with 16-view inputs, UniSem reduces depth Rel by 15.2% and improves open-vocabulary segmentation mAcc by 3.7% over strong baselines.

Guibiao Liao, Qian Ren, Kaimin Liao, Hua Wang, Zhi Chen, Luchao Wang, Yaohua Tang• 2026

Related benchmarks

Task	Dataset	Result
Novel View Synthesis	Replica	PSNR24.448	205
Depth Estimation	ScanNet	AbsRel3.73	133
Novel View Synthesis	ScanNet	PSNR25.579	132
3D Semantic Segmentation	ScanNet	mIoU55.2	57
Novel View Synthesis	ScanNet unseen real scenes	PSNR25.58	18
Semantic segmentation	Replica	Avg Accuracy79.66	16
Depth Estimation	ScanNet (40 unseen scenes)	Rel Error3.84	8
Open-Vocabulary 3D Segmentation	ScanNet (40 unseen scenes)	mIoU56.77	7
Depth Estimation	Replica	--	6

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord