Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Gumbel-Attention for Multi-modal Machine Translation

About

Multi-modal machine translation (MMT) improves translation quality by introducing visual information. However, the existing MMT model ignores the problem that the image will bring information irrelevant to the text, causing much noise to the model and affecting the translation quality. This paper proposes a novel Gumbel-Attention for multi-modal machine translation, which selects the text-related parts of the image features. Specifically, different from the previous attention-based method, we first use a differentiable method to select the image information and automatically remove the useless parts of the image features. Experiments prove that our method retains the image features related to the text, and the remaining parts help the MMT model generates better translations.

Pengbo Liu, Hailong Cao, Tiejun Zhao• 2021

Related benchmarks

TaskDatasetResultRank
Multimodal Machine Translation (English-German)Multi30K 2016 (test)
BLEU39.2
52
Multimodal Machine TranslationMulti30k En-De 2017 (test)
METEOR51.2
45
Multi-modal Machine TranslationMulti30k WMT17 (test)
BLEU31.4
16
Multimodal Machine TranslationMSCOCO Ambiguous EN-DE (test)
BLEU26.9
13
Machine Translation (En-De)Multi30K MSCOCO
BLEU26.9
12
Multi-modal Machine TranslationMSCOCO (test)
BLEU26.9
5
Showing 6 of 6 rows

Other info

Follow for update