MSTA3D: Multi-scale Twin-attention for 3D Instance Segmentation
About
Recently, transformer-based techniques incorporating superpoints have become prevalent in 3D instance segmentation. However, they often encounter an over-segmentation problem, especially noticeable with large objects. Additionally, unreliable mask predictions stemming from superpoint mask prediction further compound this issue. To address these challenges, we propose a novel framework called MSTA3D. It leverages multi-scale feature representation and introduces a twin-attention mechanism to effectively capture them. Furthermore, MSTA3D integrates a box query with a box regularizer, offering a complementary spatial constraint alongside semantic queries. Experimental evaluations on ScanNetV2, ScanNet200 and S3DIS datasets demonstrate that our approach surpasses state-of-the-art 3D instance segmentation methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Instance Segmentation | ScanNet V2 (val) | Average AP5077 | 195 | |
| 3D Instance Segmentation | ScanNet v2 (test) | mAP56.9 | 135 | |
| 3D Instance Segmentation | S3DIS (Area 5) | mAP@50% IoU70 | 106 | |
| 3D Instance Segmentation | ScanNet200 (val) | mAP26.2 | 52 |