Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

You Only Segment Once: Towards Real-Time Panoptic Segmentation

About

In this paper, we propose YOSO, a real-time panoptic segmentation framework. YOSO predicts masks via dynamic convolutions between panoptic kernels and image feature maps, in which you only need to segment once for both instance and semantic segmentation tasks. To reduce the computational overhead, we design a feature pyramid aggregator for the feature map extraction, and a separable dynamic decoder for the panoptic kernel generation. The aggregator re-parameterizes interpolation-first modules in a convolution-first way, which significantly speeds up the pipeline without any additional costs. The decoder performs multi-head cross-attention via separable dynamic convolution for better efficiency and accuracy. To the best of our knowledge, YOSO is the first real-time panoptic segmentation framework that delivers competitive performance compared to state-of-the-art models. Specifically, YOSO achieves 46.4 PQ, 45.6 FPS on COCO; 52.5 PQ, 22.6 FPS on Cityscapes; 38.0 PQ, 35.4 FPS on ADE20K; and 34.1 PQ, 7.1 FPS on Mapillary Vistas. Code is available at https://github.com/hujiecpp/YOSO.

Jie Hu, Linyan Huang, Tianhe Ren, Shengchuan Zhang, Rongrong Ji, Liujuan Cao• 2023

Related benchmarks

TaskDatasetResultRank
Instance SegmentationCOCO 2017 (val)--
1201
Semantic segmentationADE20K
mIoU44.7
1024
Semantic segmentationCityscapes
mIoU79.4
658
Instance SegmentationCOCO (val)
APmk35.6
475
Panoptic SegmentationCityscapes (val)
PQ59.7
276
Panoptic SegmentationCOCO (val)
PQ48.4
219
Panoptic SegmentationCOCO 2017 (val)
PQ48.4
185
Panoptic SegmentationADE20K (val)
PQ38
89
Panoptic SegmentationMapillary Vistas (val)
PQ34.1
82
Semantic segmentationCOCO 2017 (val)
mIoU58.74
66
Showing 10 of 12 rows

Other info

Code

Follow for update