Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

You Only Segment Once: Towards Real-Time Panoptic Segmentation

About

In this paper, we propose YOSO, a real-time panoptic segmentation framework. YOSO predicts masks via dynamic convolutions between panoptic kernels and image feature maps, in which you only need to segment once for both instance and semantic segmentation tasks. To reduce the computational overhead, we design a feature pyramid aggregator for the feature map extraction, and a separable dynamic decoder for the panoptic kernel generation. The aggregator re-parameterizes interpolation-first modules in a convolution-first way, which significantly speeds up the pipeline without any additional costs. The decoder performs multi-head cross-attention via separable dynamic convolution for better efficiency and accuracy. To the best of our knowledge, YOSO is the first real-time panoptic segmentation framework that delivers competitive performance compared to state-of-the-art models. Specifically, YOSO achieves 46.4 PQ, 45.6 FPS on COCO; 52.5 PQ, 22.6 FPS on Cityscapes; 38.0 PQ, 35.4 FPS on ADE20K; and 34.1 PQ, 7.1 FPS on Mapillary Vistas. Code is available at https://github.com/hujiecpp/YOSO.

Jie Hu, Linyan Huang, Tianhe Ren, Shengchuan Zhang, Rongrong Ji, Liujuan Cao• 2023

Related benchmarks

TaskDatasetResultRank
Semantic segmentationADE20K
mIoU44.7
936
Semantic segmentationCityscapes
mIoU79.4
578
Instance SegmentationCOCO (val)
APmk35.6
472
Panoptic SegmentationCityscapes (val)
PQ59.7
276
Panoptic SegmentationCOCO (val)
PQ48.4
219
Panoptic SegmentationADE20K (val)
PQ38
89
Panoptic SegmentationMapillary Vistas (val)
PQ34.1
82
Panoptic SegmentationADE20K 150 categories (val)
PQ38
6
Showing 8 of 8 rows

Other info

Code

Follow for update