Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MP-Former: Mask-Piloted Transformer for Image Segmentation

About

We present a mask-piloted Transformer which improves masked-attention in Mask2Former for image segmentation. The improvement is based on our observation that Mask2Former suffers from inconsistent mask predictions between consecutive decoder layers, which leads to inconsistent optimization goals and low utilization of decoder queries. To address this problem, we propose a mask-piloted training approach, which additionally feeds noised ground-truth masks in masked-attention and trains the model to reconstruct the original ones. Compared with the predicted masks used in mask-attention, the ground-truth masks serve as a pilot and effectively alleviate the negative impact of inaccurate mask predictions in Mask2Former. Based on this technique, our \M achieves a remarkable performance improvement on all three image segmentation tasks (instance, panoptic, and semantic), yielding $+2.3$AP and $+1.6$mIoU on the Cityscapes instance and semantic segmentation tasks with a ResNet-50 backbone. Our method also significantly speeds up the training, outperforming Mask2Former with half of the number of training epochs on ADE20K with both a ResNet-50 and a Swin-L backbones. Moreover, our method only introduces little computation during training and no extra computation during inference. Our code will be released at \url{https://github.com/IDEA-Research/MP-Former}.

Hao Zhang, Feng Li, Huaizhe Xu, Shijia Huang, Shilong Liu, Lionel M. Ni, Lei Zhang• 2023

Related benchmarks

TaskDatasetResultRank
Semantic segmentationADE20K (val)
mIoU56.9
2731
Instance SegmentationCOCO 2017 (val)--
1144
Panoptic SegmentationCityscapes (val)
PQ67.5
276
Instance SegmentationCityscapes (val)
AP44.9
239
Panoptic SegmentationCOCO 2017 (val)
PQ58.1
172
Panoptic SegmentationADE20K (val)
PQ49.4
89
Instance SegmentationADE20K (val)--
21
Semantic segmentationMaSS13K 500 (val)
mIoU87.76
16
Semantic segmentationMaSS 13K 1,500 (test)
mIoU87.18
16
Panoptic SegmentationNuInsSeg
PQ44.73
9
Showing 10 of 11 rows

Other info

Code

Follow for update