Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Contrastive Grouping with Transformer for Referring Image Segmentation

About

Referring image segmentation aims to segment the target referent in an image conditioning on a natural language expression. Existing one-stage methods employ per-pixel classification frameworks, which attempt straightforwardly to align vision and language at the pixel level, thus failing to capture critical object-level information. In this paper, we propose a mask classification framework, Contrastive Grouping with Transformer network (CGFormer), which explicitly captures object-level information via token-based querying and grouping strategy. Specifically, CGFormer first introduces learnable query tokens to represent objects and then alternately queries linguistic features and groups visual features into the query tokens for object-aware cross-modal reasoning. In addition, CGFormer achieves cross-level interaction by jointly updating the query tokens and decoding masks in every two consecutive layers. Finally, CGFormer cooperates contrastive learning to the grouping strategy to identify the token and its mask corresponding to the referent. Experimental results demonstrate that CGFormer outperforms state-of-the-art methods in both segmentation and generalization settings consistently and significantly.

Jiajin Tang, Ge Zheng, Cheng Shi, Sibei Yang• 2023

Related benchmarks

TaskDatasetResultRank
Referring Image SegmentationRefCOCO (val)
mIoU76.93
259
Referring Expression SegmentationRefCOCO (testA)
cIoU77.3
257
Referring Image SegmentationRefCOCO+ (test-B)
mIoU61.72
252
Referring Image SegmentationRefCOCO (test A)
mIoU78.7
230
Referring Expression SegmentationRefCOCO+ (testA)
cIoU71
230
Referring Expression SegmentationRefCOCO+ (val)
cIoU64.54
223
Referring Expression SegmentationRefCOCO (testB)
cIoU70.64
213
Referring Expression SegmentationRefCOCO (val)
cIoU74.75
212
Referring Expression SegmentationRefCOCO+ (testB)
cIoU57.14
210
Referring Image SegmentationRefCOCO+ (val)
mIoU68.56
179
Showing 10 of 69 rows

Other info

Code

Follow for update