Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Zero-shot Referring Image Segmentation with Global-Local Context Features

About

Referring image segmentation (RIS) aims to find a segmentation mask given a referring expression grounded to a region of the input image. Collecting labelled datasets for this task, however, is notoriously costly and labor-intensive. To overcome this issue, we propose a simple yet effective zero-shot referring image segmentation method by leveraging the pre-trained cross-modal knowledge from CLIP. In order to obtain segmentation masks grounded to the input text, we propose a mask-guided visual encoder that captures global and local contextual information of an input image. By utilizing instance masks obtained from off-the-shelf mask proposal techniques, our method is able to segment fine-detailed Istance-level groundings. We also introduce a global-local text encoder where the global feature captures complex sentence-level semantics of the entire input expression while the local feature focuses on the target noun phrase extracted by a dependency parser. In our experiments, the proposed method outperforms several zero-shot baselines of the task and even the weakly supervised referring expression segmentation method with substantial margins. Our code is available at https://github.com/Seonghoon-Yu/Zero-shot-RIS.

Seonghoon Yu, Paul Hongsuck Seo, Jeany Son• 2023

Related benchmarks

TaskDatasetResultRank
Referring Expression ComprehensionRefCOCO+ (val)--
345
Referring Expression SegmentationRefCOCO (testA)
cIoU35.3
217
Referring Expression SegmentationRefCOCO+ (val)
cIoU26.2
201
Referring Expression SegmentationRefCOCO (testB)
cIoU24.7
191
Referring Expression SegmentationRefCOCO (val)
cIoU24.9
190
Referring Expression SegmentationRefCOCO+ (testA)
cIoU24.9
190
Referring Expression SegmentationRefCOCO+ (testB)
cIoU25.8
188
Referring Expression SegmentationRefCOCOg (val)
cIoU44
107
Referring Expression SegmentationRefCOCOg (test)
cIoU31
78
Referring Expression SegmentationRefCOCO UMD (val)
cIoU32.9
50
Showing 10 of 15 rows

Other info

Code

Follow for update