Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LESV: Language Embedded Sparse Voxel Fusion for Open-Vocabulary 3D Scene Understanding

About

Recent advancements in open-vocabulary 3D scene understanding heavily rely on 3D Gaussian Splatting (3DGS) to register vision-language features into 3D space. However, we identify two critical limitations in these approaches: the spatial ambiguity arising from unstructured, overlapping Gaussians which necessitates probabilistic feature registration, and the multi-level semantic ambiguity caused by pooling features over object-level masks, which dilutes fine-grained details. To address these challenges, we present a novel framework that leverages Sparse Voxel Rasterization (SVRaster) as a structured, disjoint geometry representation. By regularizing SVRaster with monocular depth and normal priors, we establish a stable geometric foundation. This enables a deterministic, confidence-aware feature registration process and suppresses the semantic bleeding artifact common in 3DGS. Furthermore, we resolve multi-level ambiguity by exploiting the emerging dense alignment properties of foundation model AM-RADIO, avoiding the computational overhead of hierarchical training methods. Our approach achieves state-of-the-art performance on Open Vocabulary 3D Object Retrieval and Point Cloud Understanding benchmarks, particularly excelling on fine-grained queries where registration methods typically fail.

Fusang Wang, Nathan Piasco, Moussab Bennehar, Luis Rold\~ao, Dzmitry Tsishkou, Fabien Moutarde• 2026

Related benchmarks

TaskDatasetResultRank
Open-Vocabulary 3D Semantic SegmentationScanNet 19 classes
mIoU53.22
12
Open-Vocabulary 3D Semantic SegmentationScanNet 15 classes
mIoU54.78
12
Open-Vocabulary 3D Semantic SegmentationScanNet 10 classes
mIoU65.25
12
3D Object RetrievalLERF standard scene (~300 images)
Geometry Computation Time (mins)15
4
Open-vocabulary 2D object retrieval and localizationLERF
mIoU (Ramen Scene)63.4
4
Open-vocabulary 3D object retrievalLERF
Ramen mIoU53.34
4
2D Object RetrievalLERF
mIoU56.84
3
3D Object RetrievalLERF
mIoU56.11
3
3D PCD UnderstandingScanNet
mIoU53.22
3
Showing 9 of 9 rows

Other info

Follow for update