Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Locate then Segment: A Strong Pipeline for Referring Image Segmentation

About

Referring image segmentation aims to segment the objects referred by a natural language expression. Previous methods usually focus on designing an implicit and recurrent feature interaction mechanism to fuse the visual-linguistic features to directly generate the final segmentation mask without explicitly modeling the localization information of the referent instances. To tackle these problems, we view this task from another perspective by decoupling it into a "Locate-Then-Segment" (LTS) scheme. Given a language expression, people generally first perform attention to the corresponding target image regions, then generate a fine segmentation mask about the object based on its context. The LTS first extracts and fuses both visual and textual features to get a cross-modal representation, then applies a cross-model interaction on the visual-textual features to locate the referred object with position prior, and finally generates the segmentation result with a light-weight segmentation network. Our LTS is simple but surprisingly effective. On three popular benchmark datasets, the LTS outperforms all the previous state-of-the-art methods by a large margin (e.g., +3.2% on RefCOCO+ and +3.4% on RefCOCOg). In addition, our model is more interpretable with explicitly locating the object, which is also proved by visualization experiments. We believe this framework is promising to serve as a strong baseline for referring image segmentation.

Ya Jing, Tao Kong, Wei Wang, Liang Wang, Lei Li, Tieniu Tan• 2021

Related benchmarks

TaskDatasetResultRank
Referring Image SegmentationRefCOCO (val)
mIoU65.43
259
Referring Expression SegmentationRefCOCO (testA)
cIoU67.76
257
Referring Image SegmentationRefCOCO+ (test-B)
mIoU48.02
252
Referring Image SegmentationRefCOCO (test A)
mIoU67.76
230
Referring Expression SegmentationRefCOCO+ (testA)
cIoU58.32
230
Referring Expression SegmentationRefCOCO+ (val)
cIoU54.21
223
Referring Expression SegmentationRefCOCO (testB)
cIoU63.08
213
Referring Expression SegmentationRefCOCO (val)
cIoU65.43
212
Referring Expression SegmentationRefCOCO+ (testB)
cIoU48.02
210
Referring Image SegmentationRefCOCO+ (val)
mIoU54.21
179
Showing 10 of 37 rows

Other info

Follow for update