Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RESMatch: Referring Expression Segmentation in a Semi-Supervised Manner

About

Referring expression segmentation (RES), a task that involves localizing specific instance-level objects based on free-form linguistic descriptions, has emerged as a crucial frontier in human-AI interaction. It demands an intricate understanding of both visual and textual contexts and often requires extensive training data. This paper introduces RESMatch, the first semi-supervised learning (SSL) approach for RES, aimed at reducing reliance on exhaustive data annotation. Extensive validation on multiple RES datasets demonstrates that RESMatch significantly outperforms baseline approaches, establishing a new state-of-the-art. Although existing SSL techniques are effective in image segmentation, we find that they fall short in RES. Facing the challenges including the comprehension of free-form linguistic descriptions and the variability in object attributes, RESMatch introduces a trifecta of adaptations: revised strong perturbation, text augmentation, and adjustments for pseudo-label quality and strong-weak supervision. This pioneering work lays the groundwork for future research in semi-supervised learning for referring expression segmentation.

Ying Zang, Chenglong Fu, Runlong Cao, Didi Zhu, Min Zhang, Wenjun Hu, Lanyun Zhu, Tianrun Chen• 2024

Related benchmarks

TaskDatasetResultRank
Referring Expression SegmentationRefCOCOg UMD (val)
mIoU45.24
52
Referring Expression SegmentationRefCOCOg UMD (test-u)
mIoU47.39
46
Referring Expression SegmentationRefCOCO UMD (testB)
Overall IoU54.17
34
Referring Expression SegmentationrefCOCO+ UMD (testB)
Overall IoU37.97
34
Referring Expression SegmentationRefCOCO UMD partition (test A)
Overall IoU62.56
34
Referring Expression SegmentationrefCOCO+ UMD (val)
Overall IoU45.03
34
Referring Expression SegmentationrefCOCO+ UMD (testA)
Overall IoU51.22
34
Showing 7 of 7 rows

Other info

Follow for update