Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Graph Structured Network for Image-Text Matching

About

Image-text matching has received growing interest since it bridges vision and language. The key challenge lies in how to learn correspondence between image and text. Existing works learn coarse correspondence based on object co-occurrence statistics, while failing to learn fine-grained phrase correspondence. In this paper, we present a novel Graph Structured Matching Network (GSMN) to learn fine-grained correspondence. The GSMN explicitly models object, relation and attribute as a structured phrase, which not only allows to learn correspondence of object, relation and attribute separately, but also benefits to learn fine-grained correspondence of structured phrase. This is achieved by node-level matching and structure-level matching. The node-level matching associates each node with its relevant nodes from another modality, where the node can be object, relation or attribute. The associated nodes then jointly infer fine-grained correspondence by fusing neighborhood associations at structure-level matching. Comprehensive experiments show that GSMN outperforms state-of-the-art methods on benchmarks, with relative Recall@1 improvements of nearly 7% and 2% on Flickr30K and MSCOCO, respectively. Code will be released at: https://github.com/CrossmodalGroup/GSMN.

Chunxiao Liu, Zhendong Mao, Tianzhu Zhang, Hongtao Xie, Bin Wang, Yongdong Zhang• 2020

Related benchmarks

TaskDatasetResultRank
Text-to-Image RetrievalFlickr30K
R@157.4
460
Image-to-Text RetrievalFlickr30K 1K (test)
R@176.4
439
Text-to-Image RetrievalFlickr30k (test)
Recall@157.4
423
Image-to-Text RetrievalFlickr30K
R@176.4
379
Text-to-Image RetrievalFlickr30K 1K (test)
R@157.4
375
Image-to-Text RetrievalFlickr30k (test)
R@176.4
370
Image RetrievalFlickr30k (test)
R@157.4
195
Image-to-Text RetrievalMS-COCO 1K (test)
R@178.4
121
Text-to-Image RetrievalMS-COCO
R@590.1
79
Image RetrievalFlickr30K 1K (test)
R@157.4
70
Showing 10 of 19 rows

Other info

Code

Follow for update