OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding
About
This paper introduces OpenGaussian, a method based on 3D Gaussian Splatting (3DGS) capable of 3D point-level open vocabulary understanding. Our primary motivation stems from observing that existing 3DGS-based open vocabulary methods mainly focus on 2D pixel-level parsing. These methods struggle with 3D point-level tasks due to weak feature expressiveness and inaccurate 2D-3D feature associations. To ensure robust feature presentation and 3D point-level understanding, we first employ SAM masks without cross-frame associations to train instance features with 3D consistency. These features exhibit both intra-object consistency and inter-object distinction. Then, we propose a two-stage codebook to discretize these features from coarse to fine levels. At the coarse level, we consider the positional information of 3D points to achieve location-based clustering, which is then refined at the fine level. Finally, we introduce an instance-level 3D-2D feature association method that links 3D points to 2D masks, which are further associated with 2D CLIP features. Extensive experiments, including open vocabulary-based 3D object selection, 3D point cloud understanding, click-based 3D object selection, and ablation studies, demonstrate the effectiveness of our proposed method. The source code is available at our project page: https://3d-aigc.github.io/OpenGaussian
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Semantic Segmentation | ScanNet V2 (val) | mIoU24.89 | 171 | |
| 3D Semantic Segmentation | ScanNet (test) | mIoU8.64 | 105 | |
| 3D Semantic Mapping | Replica | mAcc16.66 | 25 | |
| Semantic segmentation | CholecSeg8K (test) | -- | 13 | |
| Semantic segmentation | EndoVis18 Seq_9 | mIoU (instrument-shaft)29.38 | 10 | |
| Semantic segmentation | EndoVis 18 (Seq 5) | mIoU (instrument-wrist)1.13 | 10 | |
| Open-vocabulary 3D Scene Understanding | LERF | Feature Distillation Time (h)1 | 7 | |
| Open Vocabulary Semantic Segmentation | LERF-OVS | mIoU38.4 | 6 | |
| Semantic segmentation | ScanNet 19 classes | mIoU24.7 | 6 | |
| 3D Panoptic Segmentation | ScanNet V2 (val) | PRQ (Thing)22.87 | 6 |