SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation

About

Open-vocabulary semantic segmentation strives to distinguish pixels into different semantic groups from an open set of categories. Most existing methods explore utilizing pre-trained vision-language models, in which the key is to adopt the image-level model for pixel-level segmentation task. In this paper, we propose a simple encoder-decoder, named SED, for open-vocabulary semantic segmentation, which comprises a hierarchical encoder-based cost map generation and a gradual fusion decoder with category early rejection. The hierarchical encoder-based cost map generation employs hierarchical backbone, instead of plain transformer, to predict pixel-level image-text cost map. Compared to plain transformer, hierarchical backbone better captures local spatial information and has linear computational complexity with respect to input size. Our gradual fusion decoder employs a top-down structure to combine cost map and the feature maps of different backbone levels for segmentation. To accelerate inference speed, we introduce a category early rejection scheme in the decoder that rejects many no-existing categories at the early layer of decoder, resulting in at most 4.7 times acceleration without accuracy degradation. Experiments are performed on multiple open-vocabulary semantic segmentation datasets, which demonstrates the efficacy of our SED method. When using ConvNeXt-B, our SED method achieves mIoU score of 31.6\% on ADE20K with 150 categories at 82 millisecond ($ms$) per image on a single A6000. We will release it at \url{https://github.com/xb534/SED.git}.

Bin Xie, Jiale Cao, Jin Xie, Fahad Shahbaz Khan, Yanwei Pang• 2023

Related benchmarks

Task	Dataset	Result
Semantic segmentation	ADE20K A-150	mIoU35.2	224
Semantic segmentation	Pascal Context 59	mIoU60.6	204
Semantic segmentation	LoveDA	mIoU24.6	192
Semantic segmentation	PC-59	mIoU60.6	174
Semantic segmentation	Vaihingen	mIoU39.02	156
Semantic segmentation	iSAID	mIoU94.32	146
Semantic segmentation	Pascal VOC 20	mIoU96.1	130
Semantic segmentation	PASCAL-Context 59 class (val)	mIoU60.6	125
Open Vocabulary Semantic Segmentation	Pascal VOC 20	mIoU96.1	113
Semantic segmentation	Potsdam	mIoU29.4	110

Showing 10 of 67 rows

Other info

Code

Follow for update

@wizwand_team Discord