Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Open-Vocabulary Semantic Segmentation with Image Embedding Balancing

About

Open-vocabulary semantic segmentation is a challenging task, which requires the model to output semantic masks of an image beyond a close-set vocabulary. Although many efforts have been made to utilize powerful CLIP models to accomplish this task, they are still easily overfitting to training classes due to the natural gaps in semantic information between training and new classes. To overcome this challenge, we propose a novel framework for openvocabulary semantic segmentation called EBSeg, incorporating an Adaptively Balanced Decoder (AdaB Decoder) and a Semantic Structure Consistency loss (SSC Loss). The AdaB Decoder is designed to generate different image embeddings for both training and new classes. Subsequently, these two types of embeddings are adaptively balanced to fully exploit their ability to recognize training classes and generalization ability for new classes. To learn a consistent semantic structure from CLIP, the SSC Loss aligns the inter-classes affinity in the image feature space with that in the text feature space of CLIP, thereby improving the generalization ability of our model. Furthermore, we employ a frozen SAM image encoder to complement the spatial information that CLIP features lack due to the low training image resolution and image-level supervision inherent in CLIP. Extensive experiments conducted across various benchmarks demonstrate that the proposed EBSeg outperforms the state-of-the-art methods. Our code and trained models will be here: https://github.com/slonetime/EBSeg.

Xiangheng Shan, Dongyue Wu, Guilin Zhu, Yuanjie Shao, Nong Sang, Changxin Gao• 2024

Related benchmarks

TaskDatasetResultRank
Semantic segmentationPASCAL VOC (val)
mIoU96.4
338
Semantic segmentationADE20K A-150
mIoU32.8
188
Semantic segmentationPascal Context 59
mIoU60.2
164
Semantic segmentationPASCAL-Context 59 class (val)
mIoU60.2
125
Semantic segmentationADE20K 847
mIoU1.37e+3
83
Semantic segmentationPASCAL-Context 59 classes (test)
mIoU60.2
75
Semantic segmentationADE20K A-847 (val)
mIoU13.7
70
Semantic segmentationPASCAL-Context PC-459
mIoU21
69
Semantic segmentationPascal Context 59
mIoU60.2
67
Semantic segmentationADE20K A-150 (val)
mIoU32.8
65
Showing 10 of 24 rows

Other info

Code

Follow for update