Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Discrete-continuous Action Space Policy Gradient-based Attention for Image-Text Matching

About

Image-text matching is an important multi-modal task with massive applications. It tries to match the image and the text with similar semantic information. Existing approaches do not explicitly transform the different modalities into a common space. Meanwhile, the attention mechanism which is widely used in image-text matching models does not have supervision. We propose a novel attention scheme which projects the image and text embedding into a common space and optimises the attention weights directly towards the evaluation metrics. The proposed attention scheme can be considered as a kind of supervised attention and requiring no additional annotations. It is trained via a novel Discrete-continuous action space policy gradient algorithm, which is more effective in modelling complex action space than previous continuous action space policy gradient. We evaluate the proposed methods on two widely-used benchmark datasets: Flickr30k and MS-COCO, outperforming the previous approaches by a large margin.

Shiyang Yan, Li Yu, Yuan Xie• 2021

Related benchmarks

TaskDatasetResultRank
Image RetrievalFlickr30K
R@162.2
144
Image RetrievalMS-COCO 1K (test)
R@163.9
128
Caption RetrievalMS COCO Karpathy 1k (test)
R@184
62
Caption RetrievalFlickr30K
R@182.8
23
Showing 4 of 4 rows

Other info

Follow for update