Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding

About

Open-vocabulary querying in 3D space is challenging but essential for scene understanding tasks such as object localization and segmentation. Language-embedded scene representations have made progress by incorporating language features into 3D spaces. However, their efficacy heavily depends on neural networks that are resource-intensive in training and rendering. Although recent 3D Gaussians offer efficient and high-quality novel view synthesis, directly embedding language features in them leads to prohibitive memory usage and decreased performance. In this work, we introduce Language Embedded 3D Gaussians, a novel scene representation for open-vocabulary query tasks. Instead of embedding high-dimensional raw semantic features on 3D Gaussians, we propose a dedicated quantization scheme that drastically alleviates the memory requirement, and a novel embedding procedure that achieves smoother yet high accuracy query, countering the multi-view feature inconsistencies and the high-frequency inductive bias in point-based representations. Our comprehensive experiments show that our representation achieves the best visual quality and language querying accuracy across current language-embedded representations, while maintaining real-time rendering frame rates on a single desktop GPU.

Jin-Chuan Shi, Miao Wang, Hao-Bin Duan, Shao-Hua Guan• 2023

Related benchmarks

TaskDatasetResultRank
3D Semantic SegmentationScanNet++
mIoU (20 classes)2.93
31
3D SegmentationMip-NeRF 360
mIoU29.1
31
Novel View ReconstructionHyperNeRF held-out 4D LangSplat (test)
Americano Score16.48
20
Novel View ReconstructionHyperNeRF 4D LangSplat (test)
Americano Score63
20
3D Semantic Segmentation3D-OVS
Bed84.9
20
3D Semantic SegmentationScanNet
mIoU (10 classes)9.84
17
3D object selectionLERF-OVS
mIoU (Mean)17.42
17
Open-Vocabulary 3D Scene SegmentationLeRF-mask
Figurines mIoU60.3
17
Open-vocabulary 3D object selectionLERF
Ramen Score46
16
3D object selectionLERF figurines scene
Peak VRAM20
14
Showing 10 of 23 rows

Other info

Code

Follow for update