SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians

About

3D Gaussian Splatting has recently gained traction for its efficient training and real-time rendering. While its vanilla representation is mainly designed for view synthesis, recent works extended it to scene understanding with language features. However, storing additional high-dimensional features per Gaussian for semantic information is memory-intensive, which limits their ability to segment and interpret challenging scenes. To this end, we introduce SuperGSeg, a novel approach that fosters cohesive, context-aware hierarchical scene representation by disentangling segmentation and language field distillation. SuperGSeg first employs neural 3D Gaussians to learn geometry, instance and hierarchical segmentation features from multi-view images with the aid of off-the-shelf 2D masks. These features are then leveraged to create a sparse set of \acrlong{superg}s. \acrlong{superg}s facilitate the lifting and distillation of 2D language features into 3D space. They enable hierarchical scene understanding with high-dimensional language feature rendering at moderate GPU memory costs. Extensive experiments demonstrate that SuperGSeg achieves remarkable performance on both open-vocabulary object selection and semantic segmentation tasks.

Siyun Liang, Sen Wang, Kunyi Li, Michael Niemeyer, Stefano Gasperini, Hendrik P.A. Lensch, Nassir Navab, Federico Tombari• 2024

Related benchmarks

Task	Dataset	Result	Rank
Open Vocabulary Semantic Segmentation	LERF-OVS	mIoU35.9		12
3D Open-Vocabulary Query	LERF-OVS	Mean mIoU35.94		7

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord