Imagination improves Multimodal Translation

About

We decompose multimodal translation into two sub-tasks: learning to translate and learning visually grounded representations. In a multitask learning framework, translations are learned in an attention-based encoder-decoder, and grounded representations are learned through image representation prediction. Our approach improves translation performance compared to the state of the art on the Multi30K dataset. Furthermore, it is equally effective if we train the image prediction task on the external MS COCO dataset, and we find improvements if we train the translation model on the external News Commentary parallel text.

Desmond Elliott, \'Akos K\'ad\'ar• 2017

Related benchmarks

Task	Dataset	Result
Multimodal Machine Translation	Multi30K (test)	BLEU-437.8	139
Multimodal Machine Translation (English-German)	Multi30K 2016 (test)	BLEU41.31	52
Multimodal Machine Translation	Multi30k En-De 2017 (test)	METEOR61.29	45
Multimodal Machine Translation	Multi30k En-Fr 2017 (test)	METEOR76.03	31
Multimodal Machine Translation	Multi30k En-Fr 2016 (test)	METEOR Score81.2	30
Machine Translation	Multi30k En→Fr v1 2017 (test)	BLEU54.07	30
Machine Translation	Multi30k Task1 (en-de)	BLEU Score41.31	26
Machine Translation	Multi30k Task1 en-fr	BLEU Score61.9	25
Machine Translation	Multi30k M30kT (test)	BLEU Score32.89	19
Multimodal Machine Translation	MSCOCO EN-FR (test)	BLEU44.81	19

Showing 10 of 17 rows

Other info

Follow for update

@wizwand_team Discord