Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Tackling View-Dependent Semantics in 3D Language Gaussian Splatting

About

Recent advancements in 3D Gaussian Splatting (3D-GS) enable high-quality 3D scene reconstruction from RGB images. Many studies extend this paradigm for language-driven open-vocabulary scene understanding. However, most of them simply project 2D semantic features onto 3D Gaussians and overlook a fundamental gap between 2D and 3D understanding: a 3D object may exhibit various semantics from different viewpoints--a phenomenon we term view-dependent semantics. To address this challenge, we propose LaGa (Language Gaussians), which establishes cross-view semantic connections by decomposing the 3D scene into objects. Then, it constructs view-aggregated semantic representations by clustering semantic descriptors and reweighting them based on multi-view semantics. Extensive experiments demonstrate that LaGa effectively captures key information from view-dependent semantics, enabling a more comprehensive understanding of 3D scenes. Notably, under the same settings, LaGa achieves a significant improvement of +18.7% mIoU over the previous SOTA on the LERF-OVS dataset. Our code is available at: https://github.com/SJTU-DeepVisionLab/LaGa.

Jiazhong Cen, Xudong Zhou, Jiemin Fang, Changsong Wen, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian• 2025

Related benchmarks

TaskDatasetResultRank
3D Semantic SegmentationScanNet 15 classes
mIoU35.5
17
3D Semantic SegmentationScanNet 10 classes
mIoU42.6
17
Open-vocabulary 3D object selectionLERF
Ramen Score61.4
16
3D object selectionLERF figurines scene
Peak VRAM24
14
Semantic segmentationScanNet 19 classes
mIoU32.5
13
Open Vocabulary Semantic SegmentationLERF-OVS
mIoU64
12
Open-Vocabulary Segmentation3D-OVS corrected (test)
mIoU (Bed)96.8
5
Showing 7 of 7 rows

Other info

Follow for update