Beyond the Final Layer: Hierarchical Query Fusion Transformer with Agent-Interpolation Initialization for 3D Instance Segmentation
About
3D instance segmentation aims to predict a set of object instances in a scene and represent them as binary foreground masks with corresponding semantic labels. Currently, transformer-based methods are gaining increasing attention due to their elegant pipelines, reduced manual selection of geometric properties, and superior performance. However, transformer-based methods fail to simultaneously maintain strong position and content information during query initialization. Additionally, due to supervision at each decoder layer, there exists a phenomenon of object disappearance with the deepening of layers. To overcome these hurdles, we introduce Beyond the Final Layer: Hierarchical Query Fusion Transformer with Agent-Interpolation Initialization for 3D Instance Segmentation (BFL). Specifically, an Agent-Interpolation Initialization Module is designed to generate resilient queries capable of achieving a balance between foreground coverage and content learning. Additionally, a Hierarchical Query Fusion Decoder is designed to retain low overlap queries, mitigating the decrease in recall with the deepening of layers. Extensive experiments on ScanNetV2, ScanNet200, ScanNet++ and S3DIS datasets demonstrate the superior performance of BFL.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Object Detection | ScanNet V2 (val) | -- | 352 | |
| 3D Instance Segmentation | ScanNet V2 (val) | Average AP5079.5 | 195 | |
| 3D Instance Segmentation | ScanNet v2 (test) | mAP60.6 | 135 | |
| 3D Instance Segmentation | S3DIS (Area 5) | mAP@50% IoU71.9 | 106 | |
| Instance Segmentation | ScanNetV2 (val) | -- | 58 | |
| 3D Instance Segmentation | ScanNet++ V1 (val) | AP5035.2 | 12 | |
| 3D Instance Segmentation | ScanNet200 v2 (val) | mAP (%)30.5 | 10 | |
| 3D Instance Segmentation | ScanNet++ V1 (test) | mAP22.2 | 7 | |
| Semantic 3D instance segmentation | ScanNet++ (val) | AP25.3 | 6 | |
| 3D Instance Segmentation | ScanNet++ (test) | mAP22.2 | 5 |