Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Imagination improves Multimodal Translation

About

We decompose multimodal translation into two sub-tasks: learning to translate and learning visually grounded representations. In a multitask learning framework, translations are learned in an attention-based encoder-decoder, and grounded representations are learned through image representation prediction. Our approach improves translation performance compared to the state of the art on the Multi30K dataset. Furthermore, it is equally effective if we train the image prediction task on the external MS COCO dataset, and we find improvements if we train the translation model on the external News Commentary parallel text.

Desmond Elliott, \'Akos K\'ad\'ar• 2017

Related benchmarks

TaskDatasetResultRank
Multimodal Machine TranslationMulti30K (test)
BLEU-437.8
139
Multimodal Machine Translation (English-German)Multi30K 2016 (test)
BLEU41.31
52
Multimodal Machine TranslationMulti30k En-De 2017 (test)
METEOR61.29
45
Multimodal Machine TranslationMulti30k En-Fr 2017 (test)
METEOR76.03
31
Multimodal Machine TranslationMulti30k En-Fr 2016 (test)
METEOR Score81.2
30
Machine TranslationMulti30k En→Fr v1 2017 (test)
BLEU54.07
30
Machine TranslationMulti30k Task1 (en-de)
BLEU Score41.31
26
Machine TranslationMulti30k Task1 en-fr
BLEU Score61.9
25
Machine TranslationMulti30k M30kT (test)
BLEU Score32.89
19
Multimodal Machine TranslationMSCOCO EN-FR (test)
BLEU44.81
19
Showing 10 of 17 rows

Other info

Follow for update