LangSplat: 3D Language Gaussian Splatting

About

Humans live in a 3D world and commonly use natural language to interact with a 3D scene. Modeling a 3D language field to support open-ended language queries in 3D has gained increasing attention recently. This paper introduces LangSplat, which constructs a 3D language field that enables precise and efficient open-vocabulary querying within 3D spaces. Unlike existing methods that ground CLIP language embeddings in a NeRF model, LangSplat advances the field by utilizing a collection of 3D Gaussians, each encoding language features distilled from CLIP, to represent the language field. By employing a tile-based splatting technique for rendering language features, we circumvent the costly rendering process inherent in NeRF. Instead of directly learning CLIP embeddings, LangSplat first trains a scene-wise language autoencoder and then learns language features on the scene-specific latent space, thereby alleviating substantial memory demands imposed by explicit modeling. Existing methods struggle with imprecise and vague 3D language fields, which fail to discern clear boundaries between objects. We delve into this issue and propose to learn hierarchical semantics using SAM, thereby eliminating the need for extensively querying the language field across various scales and the regularization of DINO features. Extensive experimental results show that LangSplat significantly outperforms the previous state-of-the-art method LERF by a large margin. Notably, LangSplat is extremely efficient, achieving a 199 $\times$ speedup compared to LERF at the resolution of 1440 $\times$ 1080. We strongly recommend readers to check out our video results at https://langsplat.github.io/

Minghan Qin, Wanhua Li, Jiawei Zhou, Haoqian Wang, Hanspeter Pfister• 2023

Related benchmarks

Task	Dataset	Result
3D Semantic Segmentation	ScanNet V2 (val)	mIoU29.47	209
Depth Estimation	ScanNet	--	121
Semantic segmentation	ScanNet (test)	mIoU26.86	64
3D Semantic Segmentation	Replica	3D mIoU4.82	47
3D Semantic Segmentation	3D-OVS	Bed99.2	42
3D Segmentation	Mip-NeRF 360	mIoU54.7	31
3D Semantic Segmentation	ScanNet++	mIoU (20 classes)2.21	31
3D object selection	LERF-OVS	mIoU (Mean)9.66	21
Novel View Reconstruction	HyperNeRF held-out 4D LangSplat (test)	Americano Score16.57	20
Novel View Reconstruction	HyperNeRF 4D LangSplat (test)	Americano Score63	20

Showing 10 of 113 rows

...

Other info

Code

Follow for update

@wizwand_team Discord