RESMatch: Referring Expression Segmentation in a Semi-Supervised Manner
About
Referring expression segmentation (RES), a task that involves localizing specific instance-level objects based on free-form linguistic descriptions, has emerged as a crucial frontier in human-AI interaction. It demands an intricate understanding of both visual and textual contexts and often requires extensive training data. This paper introduces RESMatch, the first semi-supervised learning (SSL) approach for RES, aimed at reducing reliance on exhaustive data annotation. Extensive validation on multiple RES datasets demonstrates that RESMatch significantly outperforms baseline approaches, establishing a new state-of-the-art. Although existing SSL techniques are effective in image segmentation, we find that they fall short in RES. Facing the challenges including the comprehension of free-form linguistic descriptions and the variability in object attributes, RESMatch introduces a trifecta of adaptations: revised strong perturbation, text augmentation, and adjustments for pseudo-label quality and strong-weak supervision. This pioneering work lays the groundwork for future research in semi-supervised learning for referring expression segmentation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Referring Expression Segmentation | RefCOCOg UMD (val) | mIoU45.24 | 52 | |
| Referring Expression Segmentation | RefCOCOg UMD (test-u) | mIoU47.39 | 46 | |
| Referring Expression Segmentation | RefCOCO UMD (testB) | Overall IoU54.17 | 34 | |
| Referring Expression Segmentation | refCOCO+ UMD (testB) | Overall IoU37.97 | 34 | |
| Referring Expression Segmentation | RefCOCO UMD partition (test A) | Overall IoU62.56 | 34 | |
| Referring Expression Segmentation | refCOCO+ UMD (val) | Overall IoU45.03 | 34 | |
| Referring Expression Segmentation | refCOCO+ UMD (testA) | Overall IoU51.22 | 34 |