Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OpenMask3D: Open-Vocabulary 3D Instance Segmentation

About

We introduce the task of open-vocabulary 3D instance segmentation. Current approaches for 3D instance segmentation can typically only recognize object categories from a pre-defined closed set of classes that are annotated in the training datasets. This results in important limitations for real-world applications where one might need to perform tasks guided by novel, open-vocabulary queries related to a wide variety of objects. Recently, open-vocabulary 3D scene understanding methods have emerged to address this problem by learning queryable features for each point in the scene. While such a representation can be directly employed to perform semantic segmentation, existing methods cannot separate multiple object instances. In this work, we address this limitation, and propose OpenMask3D, which is a zero-shot approach for open-vocabulary 3D instance segmentation. Guided by predicted class-agnostic 3D instance masks, our model aggregates per-mask features via multi-view fusion of CLIP-based image embeddings. Experiments and ablation studies on ScanNet200 and Replica show that OpenMask3D outperforms other open-vocabulary methods, especially on the long-tail distribution. Qualitative experiments further showcase OpenMask3D's ability to segment object properties based on free-form queries describing geometry, affordances, and materials.

Ay\c{c}a Takmaz, Elisabetta Fedele, Robert W. Sumner, Marc Pollefeys, Federico Tombari, Francis Engelmann• 2023

Related benchmarks

TaskDatasetResultRank
3D Semantic SegmentationScanNet (val)
mIoU34
144
3D Instance SegmentationScanNet200 (val)
mAP15.4
78
Instance SegmentationScanNet200 (val)
mAP@5019.9
72
3D Instance SegmentationScanNet200
mAP@0.519.9
63
Semantic segmentationScanNet V2
mIoU34
54
Semantic segmentationStanford2D3D Panoramic 1.0 (Fold-1)
mIoU29.8
53
Class-agnostic 3D instance segmentationScanNet200 (val)
AP39.7
19
3D Instance SegmentationReplica 8 scenes
mAP13.1
16
3D functionality segmentationSceneFun3D (split0 val)
AP500.2
15
Semantic segmentationS3DIS (Area 2)
mIoU36.7
15
Showing 10 of 38 rows

Other info

Code

Follow for update