Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

You Only Segment Once: Towards Real-Time Panoptic Segmentation

About

In this paper, we propose YOSO, a real-time panoptic segmentation framework. YOSO predicts masks via dynamic convolutions between panoptic kernels and image feature maps, in which you only need to segment once for both instance and semantic segmentation tasks. To reduce the computational overhead, we design a feature pyramid aggregator for the feature map extraction, and a separable dynamic decoder for the panoptic kernel generation. The aggregator re-parameterizes interpolation-first modules in a convolution-first way, which significantly speeds up the pipeline without any additional costs. The decoder performs multi-head cross-attention via separable dynamic convolution for better efficiency and accuracy. To the best of our knowledge, YOSO is the first real-time panoptic segmentation framework that delivers competitive performance compared to state-of-the-art models. Specifically, YOSO achieves 46.4 PQ, 45.6 FPS on COCO; 52.5 PQ, 22.6 FPS on Cityscapes; 38.0 PQ, 35.4 FPS on ADE20K; and 34.1 PQ, 7.1 FPS on Mapillary Vistas. Code is available at https://github.com/hujiecpp/YOSO.

Jie Hu, Linyan Huang, Tianhe Ren, Shengchuan Zhang, Rongrong Ji, Liujuan Cao• 2023

Related benchmarks

TaskDatasetResultRank
Instance SegmentationCOCO 2017 (val)--
1275
Semantic segmentationADE20K
mIoU44.7
1028
Semantic segmentationCityscapes
mIoU79.4
668
Instance SegmentationCOCO (val)
APmk35.6
485
Panoptic SegmentationCityscapes (val)
PQ59.7
288
Panoptic SegmentationCOCO (val)
PQ48.4
223
Panoptic SegmentationCOCO 2017 (val)
PQ48.4
185
Panoptic SegmentationADE20K (val)
PQ38
99
Panoptic SegmentationMapillary Vistas (val)
PQ34.1
82
Semantic segmentationCOCO 2017 (val)
mIoU58.74
66
Showing 10 of 12 rows

Other info

Code

Follow for update