Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning Two-Branch Neural Networks for Image-Text Matching Tasks

About

Image-language matching tasks have recently attracted a lot of attention in the computer vision field. These tasks include image-sentence matching, i.e., given an image query, retrieving relevant sentences and vice versa, and region-phrase matching or visual grounding, i.e., matching a phrase to relevant regions. This paper investigates two-branch neural networks for learning the similarity between these two data modalities. We propose two network structures that produce different output representations. The first one, referred to as an embedding network, learns an explicit shared latent embedding space with a maximum-margin ranking loss and novel neighborhood constraints. Compared to standard triplet sampling, we perform improved neighborhood sampling that takes neighborhood information into consideration while constructing mini-batches. The second network structure, referred to as a similarity network, fuses the two branches via element-wise product and is trained with regression loss to directly predict a similarity score. Extensive experiments show that our networks achieve high accuracies for phrase localization on the Flickr30K Entities dataset and for bi-directional image-sentence retrieval on Flickr30K and MSCOCO datasets.

Liwei Wang, Yin Li, Jing Huang, Svetlana Lazebnik• 2017

Related benchmarks

TaskDatasetResultRank
Text-to-Image RetrievalFlickr30k (test)
Recall@129.7
423
Image-to-Text RetrievalFlickr30k (test)
R@140.3
370
Text-to-Image RetrievalMS-COCO
R@575.2
79
Image-to-Text RetrievalCOCO-CN--
48
Referring Expression ComprehensionReferItGame (test)
Top-1 Acc34.54
47
Visual GroundingFlickr30K Entities (test)
Accuracy51.05
29
Image-Text RetrievalMSCOCO (test)
EN Retrieval Score76.8
28
Visual GroundingReferItGame (test)
Pr@0.50.3454
26
Image-Text RetrievalFlickr30k (test)--
21
Phrase groundingFlickr30K
Accuracy51.05
20
Showing 10 of 14 rows

Other info

Follow for update