Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CRIS: CLIP-Driven Referring Image Segmentation

About

Referring image segmentation aims to segment a referent via a natural linguistic expression.Due to the distinct data properties between text and image, it is challenging for a network to well align text and pixel-level features. Existing approaches use pretrained models to facilitate learning, yet separately transfer the language/vision knowledge from pretrained models, ignoring the multi-modal corresponding information. Inspired by the recent advance in Contrastive Language-Image Pretraining (CLIP), in this paper, we propose an end-to-end CLIP-Driven Referring Image Segmentation framework (CRIS). To transfer the multi-modal knowledge effectively, CRIS resorts to vision-language decoding and contrastive learning for achieving the text-to-pixel alignment. More specifically, we design a vision-language decoder to propagate fine-grained semantic information from textual representations to each pixel-level activation, which promotes consistency between the two modalities. In addition, we present text-to-pixel contrastive learning to explicitly enforce the text feature similar to the related pixel-level features and dissimilar to the irrelevances. The experimental results on three benchmark datasets demonstrate that our proposed framework significantly outperforms the state-of-the-art performance without any post-processing. The code will be released.

Zhaoqing Wang, Yu Lu, Qiang Li, Xunqiang Tao, Yandong Guo, Mingming Gong, Tongliang Liu• 2021

Related benchmarks

TaskDatasetResultRank
Referring Image SegmentationRefCOCO (val)
mIoU70.47
259
Referring Expression SegmentationRefCOCO (testA)
cIoU73.2
257
Referring Image SegmentationRefCOCO+ (test-B)
mIoU53.7
252
Referring Image SegmentationRefCOCO (test A)
mIoU73.2
230
Referring Expression SegmentationRefCOCO+ (testA)
cIoU68.1
230
Referring Expression SegmentationRefCOCO+ (val)
cIoU65.3
223
Medical Image SegmentationBUSI (test)
Dice67.5
216
Referring Expression SegmentationRefCOCO (testB)
cIoU66.1
213
Referring Expression SegmentationRefCOCO (val)
cIoU70.5
212
Referring Expression SegmentationRefCOCO+ (testB)
cIoU53.7
210
Showing 10 of 115 rows
...

Other info

Follow for update