Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

EOV-Seg: Efficient Open-Vocabulary Panoptic Segmentation

About

Open-vocabulary panoptic segmentation aims to segment and classify everything in diverse scenes across an unbounded vocabulary. Existing methods typically employ two-stage or single-stage framework. The two-stage framework involves cropping the image multiple times using masks generated by a mask generator, followed by feature extraction, while the single-stage framework relies on a heavyweight mask decoder to make up for the lack of spatial position information through self-attention and cross-attention in multiple stacked Transformer blocks. Both methods incur substantial computational overhead, thereby hindering the efficiency of model inference. To fill the gap in efficiency, we propose EOV-Seg, a novel single-stage, shared, efficient, and spatialaware framework designed for open-vocabulary panoptic segmentation. Specifically, EOV-Seg innovates in two aspects. First, a Vocabulary-Aware Selection (VAS) module is proposed to improve the semantic comprehension of visual aggregated features and alleviate the feature interaction burden on the mask decoder. Second, we introduce a Two-way Dynamic Embedding Experts (TDEE), which efficiently utilizes the spatial awareness capabilities of ViT-based CLIP backbone. To the best of our knowledge, EOV-Seg is the first open-vocabulary panoptic segmentation framework towards efficiency, which runs faster and achieves competitive performance compared with state-of-the-art methods. Specifically, with COCO training only, EOV-Seg achieves 24.5 PQ, 32.1 mIoU, and 11.6 FPS on the ADE20K dataset and the inference time of EOV-Seg is 4-19 times faster than state-of-theart methods. Especially, equipped with ResNet50 backbone, EOV-Seg runs 23.8 FPS with only 71M parameters on a single RTX 3090 GPU. Code is available at https://github.com/nhw649/EOV-Seg.

Hongwei Niu, Jie Hu, Jianghang Lin, Guannan Jiang, Shengchuan Zhang• 2024

Related benchmarks

TaskDatasetResultRank
Open Vocabulary Semantic SegmentationADE20K A-150
mIoU32.1
71
Open Vocabulary Semantic SegmentationPASCAL Context 59 (val)
mIoU56.9
49
Open Vocabulary Instance SegmentationMARIS in-domain (val)
Overall Class mAP49.53
28
Salient Object DetectionUserSOD (test)
MAE0.127
18
Open Vocabulary Semantic SegmentationPASCAL Context 459 (val)
mIoU16.8
17
Open Vocabulary Semantic SegmentationADE20K 847 (val)
mIoU12.8
17
Open Vocabulary Semantic SegmentationPASCAL VOC-20 (val)
mIoU94.8
15
Open Vocabulary Semantic SegmentationA-847, PC-459, A-150, PC-59, PAS-20 Average Combined (val)
mIoU42.68
15
Showing 8 of 8 rows

Other info

Follow for update