Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation

About

This paper addresses text-supervised semantic segmentation, aiming to learn a model capable of segmenting arbitrary visual concepts within images by using only image-text pairs without dense annotations. Existing methods have demonstrated that contrastive learning on image-text pairs effectively aligns visual segments with the meanings of texts. We notice that there is a discrepancy between text alignment and semantic segmentation: A text often consists of multiple semantic concepts, whereas semantic segmentation strives to create semantically homogeneous segments. To address this issue, we propose a novel framework, Image-Text Co-Decomposition (CoDe), where the paired image and text are jointly decomposed into a set of image regions and a set of word segments, respectively, and contrastive learning is developed to enforce region-word alignment. To work with a vision-language model, we present a prompt learning mechanism that derives an extra representation to highlight an image segment or a word segment of interest, with which more effective features can be extracted from that segment. Comprehensive experimental results demonstrate that our method performs favorably against existing text-supervised semantic segmentation methods on six benchmark datasets.

Ji-Jia Wu, Andy Chia-Hao Chang, Chieh-Yu Chuang, Chun-Pei Chen, Yu-Lun Liu, Min-Hung Chen, Hou-Ning Hu, Yung-Yu Chuang, Yen-Yu Lin• 2024

Related benchmarks

TaskDatasetResultRank
Semantic segmentationADE20K (val)
mIoU17.7
2888
Semantic segmentationPASCAL VOC 2012 (val)
Mean IoU57.7
2142
Semantic segmentationADE20K
mIoU17.7
1024
Semantic segmentationCityscapes
mIoU28.9
658
Semantic segmentationCOCO Stuff
mIoU23.9
379
Semantic segmentationCityscapes (val)
mIoU28.9
374
Semantic segmentationPASCAL Context (val)
mIoU30.5
360
Semantic segmentationPascal Context 59
mIoU30.5
204
Semantic segmentationPascal Context 60
mIoU30.5
139
Semantic segmentationCOCO Object
mIoU32.3
129
Showing 10 of 38 rows

Other info

Code

Follow for update