OneFormer3D: One Transformer for Unified Point Cloud Segmentation
About
Semantic, instance, and panoptic segmentation of 3D point clouds have been addressed using task-specific models of distinct design. Thereby, the similarity of all segmentation tasks and the implicit relationship between them have not been utilized effectively. This paper presents a unified, simple, and effective model addressing all these tasks jointly. The model, named OneFormer3D, performs instance and semantic segmentation consistently, using a group of learnable kernels, where each kernel is responsible for generating a mask for either an instance or a semantic category. These kernels are trained with a transformer-based decoder with unified instance and semantic queries passed as an input. Such a design enables training a model end-to-end in a single run, so that it achieves top performance on all three segmentation tasks simultaneously. Specifically, our OneFormer3D ranks 1st and sets a new state-of-the-art (+2.1 mAP50) in the ScanNet test leaderboard. We also demonstrate the state-of-the-art results in semantic, instance, and panoptic segmentation of ScanNet (+21 PQ), ScanNet200 (+3.8 mAP50), and S3DIS (+0.8 mIoU) datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | S3DIS (Area 5) | mIOU72.4 | 799 | |
| 3D Object Detection | ScanNet V2 (val) | mAP@0.2576.9 | 352 | |
| Semantic segmentation | S3DIS (6-fold) | mIoU (Mean IoU)75 | 315 | |
| Semantic segmentation | ScanNet V2 (val) | mIoU76.6 | 288 | |
| 3D Instance Segmentation | ScanNet V2 (val) | Average AP5076.3 | 195 | |
| 3D Instance Segmentation | ScanNet v2 (test) | mAP56.6 | 135 | |
| 3D Object Detection | ScanNet | mAP@0.2576.9 | 123 | |
| 3D Instance Segmentation | S3DIS (Area 5) | mAP@50% IoU68.5 | 106 | |
| 3D Semantic Segmentation | ScanNet (val) | mIoU76.6 | 100 | |
| 3D Instance Segmentation | S3DIS (6-fold CV) | Mean Precision @50% IoU82.3 | 92 |