Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration

About

We introduce Dr. Splat, a novel approach for open-vocabulary 3D scene understanding leveraging 3D Gaussian Splatting. Unlike existing language-embedded 3DGS methods, which rely on a rendering process, our method directly associates language-aligned CLIP embeddings with 3D Gaussians for holistic 3D scene understanding. The key of our method is a language feature registration technique where CLIP embeddings are assigned to the dominant Gaussians intersected by each pixel-ray. Moreover, we integrate Product Quantization (PQ) trained on general large-scale image data to compactly represent embeddings without per-scene optimization. Experiments demonstrate that our approach significantly outperforms existing approaches in 3D perception benchmarks, such as open-vocabulary 3D semantic segmentation, 3D object localization, and 3D object selection tasks. For video results, please visit : https://drsplat.github.io/

Kim Jun-Seong, GeonU Kim, Kim Yu-Ji, Yu-Chiang Frank Wang, Jaesung Choe, Tae-Hyun Oh• 2025

Related benchmarks

TaskDatasetResultRank
Open-vocabulary 3D Scene UnderstandingLERF
Feature Distillation Time (h)10
7
3D object selectionLERF-OVS
mIoU (Waldo Kitchen)29.37
5
Open-vocabulary point cloud understandingScanNet 19 classes
mIoU28.4
5
Open-vocabulary point cloud understandingScanNet 15 classes
mIoU32.67
5
3D Referring SegmentationScanNet curated (test)
3D mIoU10.56
5
Novel-view Panoptic SegmentationNeu3D coffee martini
mAcc (Pixel)88.37
5
Novel-view Panoptic SegmentationNeu3D flame salmon
mAcc (Pixel)81.22
5
Open-Vocabulary 3D Semantic SegmentationScanNet 19 classes v2
mIoU23.21
5
Open-Vocabulary 3D Semantic SegmentationScanNet 15 classes v2
mIoU25.33
5
Open-Vocabulary 3D Semantic SegmentationScanNet 10 classes v2
mIoU36.71
5
Showing 10 of 22 rows

Other info

Follow for update