Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation

About

Open-vocabulary semantic segmentation strives to distinguish pixels into different semantic groups from an open set of categories. Most existing methods explore utilizing pre-trained vision-language models, in which the key is to adopt the image-level model for pixel-level segmentation task. In this paper, we propose a simple encoder-decoder, named SED, for open-vocabulary semantic segmentation, which comprises a hierarchical encoder-based cost map generation and a gradual fusion decoder with category early rejection. The hierarchical encoder-based cost map generation employs hierarchical backbone, instead of plain transformer, to predict pixel-level image-text cost map. Compared to plain transformer, hierarchical backbone better captures local spatial information and has linear computational complexity with respect to input size. Our gradual fusion decoder employs a top-down structure to combine cost map and the feature maps of different backbone levels for segmentation. To accelerate inference speed, we introduce a category early rejection scheme in the decoder that rejects many no-existing categories at the early layer of decoder, resulting in at most 4.7 times acceleration without accuracy degradation. Experiments are performed on multiple open-vocabulary semantic segmentation datasets, which demonstrates the efficacy of our SED method. When using ConvNeXt-B, our SED method achieves mIoU score of 31.6\% on ADE20K with 150 categories at 82 millisecond ($ms$) per image on a single A6000. We will release it at \url{https://github.com/xb534/SED.git}.

Bin Xie, Jiale Cao, Jin Xie, Fahad Shahbaz Khan, Yanwei Pang• 2023

Related benchmarks

TaskDatasetResultRank
Semantic segmentationADE20K A-150
mIoU35.2
188
Semantic segmentationPascal Context 59
mIoU60.6
164
Semantic segmentationLoveDA
mIoU24.6
142
Semantic segmentationPASCAL-Context 59 class (val)
mIoU60.6
125
Semantic segmentationVaihingen
mIoU39
95
Semantic segmentationADE20K 847
mIoU1.39e+3
83
Semantic segmentationPASCAL-Context 59 classes (test)
mIoU60.6
75
Semantic segmentationPotsdam
mIoU29.4
73
Semantic segmentationPASCAL-Context PC-459
mIoU22.6
69
Semantic segmentationiSAID
mIoU51.2
68
Showing 10 of 40 rows

Other info

Code

Follow for update