Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation

About

Referring Image Segmentation (RIS) aims to segment the object in an image uniquely referred to by a natural language expression. However, RIS training often contains hard-to-align and instance-specific visual signals; optimizing on such pixels injects misleading gradients and drives the model in the wrong direction. By explicitly estimating pixel-level vision-language alignment, the learner can suppress low-alignment regions, concentrate on reliable cues, and acquire more generalizable alignment features. In this paper, we propose Alignment-Aware Masked Learning (AML), a simple yet effective training strategy that quantifies region-referent alignment (PMME) and filters out unreliable pixels during optimization (AFM). Specifically, each sample first computes a similarity map between visual and textual features, and then masks out pixels falling below an adaptive similarity threshold, thereby excluding poorly aligned regions from the training process. AML does not require architectural changes and incurs no inference overhead, directing attention to the areas aligned with the textual description. Experiments on the RefCOCO (vanilla/+/g) datasets show that AML achieves state-of-the-art results across all 8 splits, and beyond improving RIS performance, AML also enhances the model's robustness to diverse descriptions and scenarios. Code is available at https://github.com/pipashu1/AMLRIS.

Tongfei Chen, Shuo Yang, Yuguang Yang, Linlin Yang, Runtang Guo, Changbai Li, He Long, Chunyu Xie, Dawei Leng, Baochang Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Referring Image SegmentationRefCOCO (val)
mIoU77.89
259
Referring Image SegmentationRefCOCO+ (test-B)
mIoU64.61
252
Referring Image SegmentationRefCOCO (test A)
mIoU79.53
230
Referring Image SegmentationRefCOCO+ (val)
mIoU71.33
179
Referring Image SegmentationRefCOCO (test-B)
mIoU74.99
171
Referring Image SegmentationRefCOCOg (val)
oIoU68.84
100
Referring Image SegmentationRefCOCO+ (testA)
mIoU75.61
97
Referring Image SegmentationRefCOCOg (test)
oIoU70.01
61
Showing 8 of 8 rows

Other info

Follow for update