Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Image Pivoting for Learning Multilingual Multimodal Representations

About

In this paper we propose a model to learn multimodal multilingual representations for matching images and sentences in different languages, with the aim of advancing multilingual versions of image search and image understanding. Our model learns a common representation for images and their descriptions in two different languages (which need not be parallel) by considering the image as a pivot between two languages. We introduce a new pairwise ranking loss function which can handle both symmetric and asymmetric similarity between the two modalities. We evaluate our models on image-description ranking for German and English, and on semantic textual similarity of image descriptions in English. In both cases we achieve state-of-the-art performance.

Spandana Gella, Rico Sennrich, Frank Keller, Mirella Lapata• 2017

Related benchmarks

TaskDatasetResultRank
Image-to-Text RetrievalCOCO-CN--
48
Image-Text RetrievalMSCOCO (test)
EN Retrieval Score78.3
28
Image-Text RetrievalFlickr30k (test)--
21
Showing 3 of 3 rows

Other info

Follow for update