Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Visibility-Aware Language Aggregation for Open-Vocabulary Segmentation in 3D Gaussian Splatting

About

Recently, distilling open-vocabulary language features from 2D images into 3D Gaussians has attracted significant attention. Although existing methods achieve impressive language-based interactions of 3D scenes, we observe two fundamental issues: background Gaussians contributing negligibly to a rendered pixel get the same feature as the dominant foreground ones, and multi-view inconsistencies due to view-specific noise in language embeddings. We introduce Visibility-Aware Language Aggregation (VALA), a lightweight yet effective method that computes marginal contributions for each ray and applies a visibility-aware gate to retain only visible Gaussians. Moreover, we propose a streaming weighted geometric median in cosine space to merge noisy multi-view features. Our method yields a robust, view-consistent language feature embedding in a fast and memory-efficient manner. VALA improves open-vocabulary localization and segmentation across reference datasets, consistently surpassing existing works. More results are available at https://vala3d.github.io

Sen Wang, Kunyi Li, Siyun Liang, Elena Alegret, Jing Ma, Nassir Navab, Stefano Gasperini• 2025

Related benchmarks

TaskDatasetResultRank
3D Semantic SegmentationScanNet 10 classes
mIoU46.21
17
3D Semantic SegmentationScanNet 15 classes
mIoU35.1
17
Semantic segmentationScanNet 19 classes
mIoU32.11
13
Open Vocabulary Semantic SegmentationLERF-OVS
mIoU61.7
12
Showing 4 of 4 rows

Other info

Follow for update