Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Position Focused Attention Network for Image-Text Matching

About

Image-text matching tasks have recently attracted a lot of attention in the computer vision field. The key point of this cross-domain problem is how to accurately measure the similarity between the visual and the textual contents, which demands a fine understanding of both modalities. In this paper, we propose a novel position focused attention network (PFAN) to investigate the relation between the visual and the textual views. In this work, we integrate the object position clue to enhance the visual-text joint-embedding learning. We first split the images into blocks, by which we infer the relative position of region in the image. Then, an attention mechanism is proposed to model the relations between the image region and blocks and generate the valuable position feature, which will be further utilized to enhance the region expression and model a more reliable relationship between the visual image and the textual sentence. Experiments on the popular datasets Flickr30K and MS-COCO show the effectiveness of the proposed method. Besides the public datasets, we also conduct experiments on our collected practical large-scale news dataset (Tencent-News) to validate the practical application value of proposed method. As far as we know, this is the first attempt to test the performance on the practical application. Our method achieves the state-of-art performance on all of these three datasets.

Yaxiong Wang, Hao Yang, Xueming Qian, Lin Ma, Jing Lu, Biao Li, Xin Fan• 2019

Related benchmarks

TaskDatasetResultRank
Text-to-Image RetrievalFlickr30K
R@150.4
460
Image-to-Text RetrievalFlickr30K 1K (test)
R@170
439
Text-to-Image RetrievalFlickr30k (test)
Recall@150.4
423
Image-to-Text RetrievalFlickr30k (test)
R@170
370
Image RetrievalFlickr30k (test)
R@150.4
195
Image RetrievalMS-COCO 1K (test)
R@161.6
128
Text-to-Image RetrievalMSCOCO (1K test)
R@161.6
104
Image-to-Text RetrievalMSCOCO (1K test)
R@176.5
82
Text-to-Image RetrievalMS-COCO
R@589.6
79
Text RetrievalMS-COCO 1K (test)
R@176.5
53
Showing 10 of 16 rows

Other info

Follow for update