Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ExtrinSplat: Decoupling Geometry and Semantics for Open-Vocabulary Understanding in 3D Gaussian Splatting

About

Lifting 2D open-vocabulary understanding into 3D Gaussian Splatting (3DGS) scenes is a critical challenge. Mainstream methods, built on an embedding paradigm, suffer from three key flaws: (i) geometry-semantic inconsistency, where points, rather than objects, serve as the semantic basis, limiting semantic fidelity; (ii) semantic bloat from injecting gigabytes of feature data into the geometry; and (iii) semantic rigidity, as one feature per Gaussian struggles to capture rich polysemy. To overcome these limitations, we introduce ExtrinSplat, a framework built on the extrinsic paradigm that decouples geometry from semantics. Instead of embedding features, ExtrinSplat clusters Gaussians into multi-granularity, overlapping 3D object groups. A Vision-Language Model (VLM) then interprets these groups to generate lightweight textual hypotheses, creating an extrinsic index layer that natively supports complex polysemy. By replacing costly feature embedding with lightweight indices, ExtrinSplat reduces scene adaptation time from hours to minutes and lowers storage overhead by several orders of magnitude. On benchmark tasks for open-vocabulary 3D object selection and semantic segmentation, ExtrinSplat outperforms established embedding-based frameworks, validating the efficacy and efficiency of the proposed extrinsic paradigm.

Jiayu Ding, Xinpeng Liu, Zhiyi Pan, Shiqiang Long, Ge Li• 2025

Related benchmarks

TaskDatasetResultRank
Open-vocabulary 3D object selectionLERF
Ramen Score45.6
16
3D object selectionLERF figurines scene
Peak VRAM8
14
Open-Vocabulary 3D Semantic SegmentationScanNet 19 classes
mIoU45.5
12
Open-Vocabulary 3D Semantic SegmentationScanNet 15 classes
mIoU47.2
12
Open-Vocabulary 3D Semantic SegmentationScanNet 10 classes
mIoU53.7
12
Showing 5 of 5 rows

Other info

Follow for update