Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Weighted Transformer Network for Machine Translation

About

State-of-the-art results on neural machine translation often use attentional sequence-to-sequence models with some form of convolution or recursion. Vaswani et al. (2017) propose a new architecture that avoids recurrence and convolution completely. Instead, it uses only self-attention and feed-forward layers. While the proposed architecture achieves state-of-the-art results on several machine translation tasks, it requires a large number of parameters and training iterations to converge. We propose Weighted Transformer, a Transformer with modified attention layers, that not only outperforms the baseline network in BLEU score but also converges 15-40% faster. Specifically, we replace the multi-head attention by multiple self-attention branches that the model learns to combine during the training process. Our model improves the state-of-the-art performance by 0.5 BLEU points on the WMT 2014 English-to-German translation task and by 0.4 on the English-to-French translation task.

Karim Ahmed, Nitish Shirish Keskar, Richard Socher• 2017

Related benchmarks

TaskDatasetResultRank
Machine TranslationWMT En-De 2014 (test)
BLEU28.9
379
Machine TranslationWMT En-Fr 2014 (test)
BLEU41.4
237
Machine TranslationWMT English-German 2014 (test)
BLEU28.9
136
Machine TranslationWMT En-De '14
BLEU28.9
89
Machine TranslationWMT14 En-De newstest2014 (test)
BLEU28.9
65
Machine TranslationWMT en-fr 14
BLEU Score41.4
56
Machine TranslationWMT En-Fr newstest 2014 (test)
BLEU41.4
46
Machine TranslationWMT14 English-French (newstest2014)
BLEU41.4
39
English-German Machine TranslationWMT (newstest2014)
BLEU28.9
19
Machine TranslationWMT English-German 2014 (newstest)
BLEU (tok)28.9
10
Showing 10 of 10 rows

Other info

Follow for update