Very Deep Transformers for Neural Machine Translation
About
We explore the application of very deep Transformer models for Neural Machine Translation (NMT). Using a simple yet effective initialization technique that stabilizes training, we show that it is feasible to build standard Transformer-based models with up to 60 encoder layers and 12 decoder layers. These deep models outperform their baseline 6-layer counterparts by as much as 2.5 BLEU, and achieve new state-of-the-art benchmark results on WMT14 English-French (43.8 BLEU and 46.4 BLEU with back-translation) and WMT14 English-German (30.1 BLEU).The code and trained models will be publicly available at: https://github.com/namisan/exdeep-nmt.
Xiaodong Liu, Kevin Duh, Liyuan Liu, Jianfeng Gao• 2020
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Machine Translation | WMT En-De '14 | BLEU29.5 | 89 | |
| Machine Translation | WMT en-fr 14 | BLEU Score41.8 | 56 | |
| Machine Translation | WMT English-French 2014 (test) | BLEU43.8 | 41 | |
| Machine Translation | WMT14 English-French (newstest2014) | BLEU43.8 | 39 | |
| Machine Translation | WMT English-German (EN-DE) 2014 (test) | BLEU Score30.1 | 11 | |
| Machine Translation | WMT'14 1.2.10 (test) | BLEU46.4 | 7 | |
| Machine Translation | WMT English-German newstest2014 (test) | BLEU30.1 | 7 |
Showing 7 of 7 rows