Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Beyond One-to-One: Rethinking the Referring Image Segmentation

About

Referring image segmentation aims to segment the target object referred by a natural language expression. However, previous methods rely on the strong assumption that one sentence must describe one target in the image, which is often not the case in real-world applications. As a result, such methods fail when the expressions refer to either no objects or multiple objects. In this paper, we address this issue from two perspectives. First, we propose a Dual Multi-Modal Interaction (DMMI) Network, which contains two decoder branches and enables information flow in two directions. In the text-to-image decoder, text embedding is utilized to query the visual feature and localize the corresponding target. Meanwhile, the image-to-text decoder is implemented to reconstruct the erased entity-phrase conditioned on the visual feature. In this way, visual features are encouraged to contain the critical semantic information about target entity, which supports the accurate segmentation in the text-to-image decoder in turn. Secondly, we collect a new challenging but realistic dataset called Ref-ZOM, which includes image-text pairs under different settings. Extensive experiments demonstrate our method achieves state-of-the-art performance on different datasets, and the Ref-ZOM-trained model performs well on various types of text inputs. Codes and datasets are available at https://github.com/toggle1995/RIS-DMMI.

Yutao Hu, Qixiong Wang, Wenqi Shao, Enze Xie, Zhenguo Li, Jungong Han, Ping Luo• 2023

Related benchmarks

TaskDatasetResultRank
Referring Image SegmentationRefCOCO+ (test-B)
mIoU57
200
Referring Image SegmentationRefCOCO (val)--
197
Referring Image SegmentationRefCOCO (test A)
mIoU77.1
178
Referring Image SegmentationRefCOCO (test-B)--
119
Referring Image SegmentationRefCOCO+ (val)--
117
Generalized Referring Expression SegmentationgRefCOCO (testA)
cIoU68.83
115
Generalized Referring Expression SegmentationgRefCOCO (val)
cIoU62.78
98
Generalized Referring Expression SegmentationgRefCOCO (testB)
cIoU60.01
97
Referring Image SegmentationRefCOCO+ (test-A)
oIoU69.73
89
Referring Image SegmentationRefCOCO+ (testA)
mIoU69.7
45
Showing 10 of 25 rows

Other info

Follow for update