Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Progressive Prompt-Guided Cross-Modal Reasoning for Referring Image Segmentation

About

Referring image segmentation aims to localize and segment a target object in an image based on a free-form referring expression. The core challenge lies in effectively bridging linguistic descriptions with object-level visual representations, especially when referring expressions involve detailed attributes and complex inter-object relationships. Existing methods either rely on cross-modal alignment or employ Semantic Segmentation Prompts, but they often lack explicit reasoning mechanisms for grounding language descriptions to target regions in the image. To address these limitations, we propose PPCR, a Progressive Prompt-guided Cross-modal Reasoning framework for referring image segmentation. PPCR explicitly structures the reasoning process as a Semantic Understanding-Spatial Grounding-Instance Segmentation pipeline. Specifically, PPCR first employs multimodal large language models (MLLMs) to generate Semantic Segmentation Prompt that capture key semantic cues of the target object. Based on this semantic context, Spatial Segmentation Prompt are further generated to reason about object location and spatial extent, enabling a progressive transition from semantic understanding to spatial grounding. The Semantic and Spatial Segmentation prompts are then jointly integrated into the segmentation module to guide accurate target localization and segmentation. Extensive experiments on standard referring image segmentation benchmarks demonstrate that PPCR consistently outperforms existing methods. The code will be publicly released to facilitate reproducibility.

Jiachen Li, Hongyun Wang, Jinyu Xu, Wenbo Jiang, Yanchun Ma, Yongjian Liu, Qing Xie, Bolong Zheng• 2026

Related benchmarks

TaskDatasetResultRank
Referring Image SegmentationRefCOCO (val)
mIoU81.1
259
Referring Image SegmentationRefCOCO+ (test-B)
mIoU69.33
252
Referring Image SegmentationRefCOCO (test A)
mIoU83.69
230
Referring Image SegmentationRefCOCO+ (val)
mIoU74.34
179
Referring Image SegmentationRefCOCO (test-B)
mIoU78.33
171
Referring Image SegmentationRefCOCO+ (testA)
mIoU79.56
97
Referring Image SegmentationRefCOCOg (val (U))
mIoU75.94
54
Referring Image SegmentationRefCOCOg (test(U))
mIoU76.1
54
Showing 8 of 8 rows

Other info

Follow for update