Phrase-Instance Alignment for Generalized Referring Segmentation
About
Generalized Referring expressions can describe one object, several related objects, or none at all. Existing generalized referring segmentation (GRES) models treat all cases alike, predicting a single binary mask and ignoring how linguistic phrases correspond to distinct visual instances. To this end, we reformulate GRES as an instance-level reasoning problem, where the model first predicts multiple instance-aware object queries conditioned on the referring expression, then aligns each with its most relevant phrase. This alignment is enforced by a Phrase-Object Alignment (POA) loss that builds fine-grained correspondence between linguistic phrases and visual instances. Given these aligned object instance queries and their learned relevance scores, the final segmentation and the no-target case are both inferred through a unified relevance-weighted aggregation mechanism. This instance-aware formulation enables explicit phrase-instance grounding, interpretable reasoning, and robust handling of complex or null expressions. Extensive experiments on the gRefCOCO and Ref-ZOM benchmarks demonstrate that our method significantly advances state-of-the-art performance by 3.22% cIoU and 12.25% N-acc.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Referring Expression Segmentation | RefCOCO (testA) | -- | 257 | |
| Referring Expression Segmentation | RefCOCO+ (testA) | -- | 230 | |
| Referring Expression Segmentation | RefCOCO+ (val) | -- | 223 | |
| Referring Expression Segmentation | RefCOCO (testB) | -- | 213 | |
| Referring Expression Segmentation | RefCOCO (val) | -- | 212 | |
| Referring Expression Segmentation | RefCOCO+ (testB) | -- | 210 | |
| Generalized Referring Expression Segmentation | gRefCOCO (testA) | cIoU73.22 | 139 | |
| Referring Expression Segmentation | RefCOCOg (val) | -- | 129 | |
| Generalized Referring Expression Segmentation | gRefCOCO (val) | cIoU68.94 | 123 | |
| Generalized Referring Expression Segmentation | gRefCOCO (testB) | cIoU63.88 | 121 |