Improving the Transformer Translation Model with Document-Level Context
About
Although the Transformer translation model (Vaswani et al., 2017) has achieved state-of-the-art performance in a variety of translation tasks, how to use document-level context to deal with discourse phenomena problematic for Transformer still remains a challenge. In this work, we extend the Transformer model with a new context encoder to represent document-level context, which is then incorporated into the original encoder and decoder. As large-scale document-level parallel corpora are usually not available, we introduce a two-step training method to take full advantage of abundant sentence-level parallel corpora and limited document-level parallel corpora. Experiments on the NIST Chinese-English datasets and the IWSLT French-English datasets show that our approach improves over Transformer significantly.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Machine Translation | BConTrasT De=>En (test) | BLEU60.08 | 28 | |
| Machine Translation | BMELD Ch=>En (test) | BLEU22.26 | 28 | |
| Machine Translation | BMELD (En=>Ch) (test) | BLEU27.13 | 28 | |
| Long-form Question Answering | ELI5 | ROUGE-L14.52 | 27 | |
| En-De Chat Translation | BConTrasT (test) | BLEU58.94 | 16 | |
| Document-Level Machine Translation | IWSLT Fr-En 2010 (test) | BLEU36.85 | 15 | |
| Machine Translation | NIST Zh-En sacreBLEU (test) | sacreBLEU47.28 | 6 | |
| Machine Translation | IWSLT En-De sacreBLEU (test) | sacreBLEU28.74 | 6 | |
| Knowledge Grounded Dialogue | Wizards of Wikipedia | F1 Score34.61 | 6 |