Zero-shot Referring Image Segmentation with Global-Local Context Features

About

Referring image segmentation (RIS) aims to find a segmentation mask given a referring expression grounded to a region of the input image. Collecting labelled datasets for this task, however, is notoriously costly and labor-intensive. To overcome this issue, we propose a simple yet effective zero-shot referring image segmentation method by leveraging the pre-trained cross-modal knowledge from CLIP. In order to obtain segmentation masks grounded to the input text, we propose a mask-guided visual encoder that captures global and local contextual information of an input image. By utilizing instance masks obtained from off-the-shelf mask proposal techniques, our method is able to segment fine-detailed Istance-level groundings. We also introduce a global-local text encoder where the global feature captures complex sentence-level semantics of the entire input expression while the local feature focuses on the target noun phrase extracted by a dependency parser. In our experiments, the proposed method outperforms several zero-shot baselines of the task and even the weakly supervised referring expression segmentation method with substantial margins. Our code is available at https://github.com/Seonghoon-Yu/Zero-shot-RIS.

Seonghoon Yu, Paul Hongsuck Seo, Jeany Son• 2023

Related benchmarks

Task	Dataset	Result
Referring Expression Comprehension	RefCOCO+ (val)	--	354
Referring Expression Segmentation	RefCOCO (testA)	cIoU35.3	315
Referring Expression Segmentation	RefCOCO+ (testA)	cIoU24.9	288
Referring Image Segmentation	RefCOCO (val)	mIoU48.77	274
Referring Expression Segmentation	RefCOCO+ (val)	cIoU26.2	272
Referring Image Segmentation	RefCOCO+ (test-B)	mIoU35.34	267
Referring Expression Segmentation	RefCOCO (val)	cIoU24.9	261
Referring Expression Segmentation	RefCOCO (testB)	cIoU24.7	259
Referring Expression Segmentation	RefCOCO+ (testB)	cIoU25.8	256
Referring Image Segmentation	RefCOCO (test A)	mIoU55	245

Showing 10 of 27 rows

Other info

Code

Follow for update

@wizwand_team Discord