Imagination improves Multimodal Translation
About
We decompose multimodal translation into two sub-tasks: learning to translate and learning visually grounded representations. In a multitask learning framework, translations are learned in an attention-based encoder-decoder, and grounded representations are learned through image representation prediction. Our approach improves translation performance compared to the state of the art on the Multi30K dataset. Furthermore, it is equally effective if we train the image prediction task on the external MS COCO dataset, and we find improvements if we train the translation model on the external News Commentary parallel text.
Desmond Elliott, \'Akos K\'ad\'ar• 2017
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multimodal Machine Translation | Multi30K (test) | BLEU-437.8 | 139 | |
| Multimodal Machine Translation (English-German) | Multi30K 2016 (test) | BLEU41.31 | 52 | |
| Multimodal Machine Translation | Multi30k En-De 2017 (test) | METEOR61.29 | 45 | |
| Multimodal Machine Translation | Multi30k En-Fr 2017 (test) | METEOR76.03 | 31 | |
| Multimodal Machine Translation | Multi30k En-Fr 2016 (test) | METEOR Score81.2 | 30 | |
| Machine Translation | Multi30k En→Fr v1 2017 (test) | BLEU54.07 | 30 | |
| Machine Translation | Multi30k Task1 (en-de) | BLEU Score41.31 | 26 | |
| Machine Translation | Multi30k Task1 en-fr | BLEU Score61.9 | 25 | |
| Machine Translation | Multi30k M30kT (test) | BLEU Score32.89 | 19 | |
| Multimodal Machine Translation | MSCOCO EN-FR (test) | BLEU44.81 | 19 |
Showing 10 of 17 rows