Contrastive Grouping with Transformer for Referring Image Segmentation

About

Referring image segmentation aims to segment the target referent in an image conditioning on a natural language expression. Existing one-stage methods employ per-pixel classification frameworks, which attempt straightforwardly to align vision and language at the pixel level, thus failing to capture critical object-level information. In this paper, we propose a mask classification framework, Contrastive Grouping with Transformer network (CGFormer), which explicitly captures object-level information via token-based querying and grouping strategy. Specifically, CGFormer first introduces learnable query tokens to represent objects and then alternately queries linguistic features and groups visual features into the query tokens for object-aware cross-modal reasoning. In addition, CGFormer achieves cross-level interaction by jointly updating the query tokens and decoding masks in every two consecutive layers. Finally, CGFormer cooperates contrastive learning to the grouping strategy to identify the token and its mask corresponding to the referent. Experimental results demonstrate that CGFormer outperforms state-of-the-art methods in both segmentation and generalization settings consistently and significantly.

Jiajin Tang, Ge Zheng, Cheng Shi, Sibei Yang• 2023

Related benchmarks

Task	Dataset	Result
Referring Expression Segmentation	RefCOCO (testA)	cIoU77.3	315
Referring Expression Segmentation	RefCOCO+ (testA)	cIoU71	288
Referring Image Segmentation	RefCOCO (val)	mIoU76.93	274
Referring Expression Segmentation	RefCOCO+ (val)	cIoU64.54	272
Referring Image Segmentation	RefCOCO+ (test-B)	mIoU61.72	267
Referring Expression Segmentation	RefCOCO (val)	cIoU74.75	261
Referring Expression Segmentation	RefCOCO (testB)	cIoU70.64	259
Referring Expression Segmentation	RefCOCO+ (testB)	cIoU57.14	256
Referring Image Segmentation	RefCOCO (test A)	mIoU78.7	245
Referring Image Segmentation	RefCOCO+ (val)	mIoU68.56	194

Showing 10 of 69 rows

Other info

Code

Follow for update

@wizwand_team Discord