Neural Machine Translation by Jointly Learning to Align and Translate
About
Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multivariate Forecasting | ETTh1 | MSE0.991 | 645 | |
| Multivariate Time-series Forecasting | ETTm1 | MSE0.444 | 433 | |
| Multivariate long-term series forecasting | ETTh2 | MSE1.552 | 319 | |
| Machine Translation | WMT En-Fr 2014 (test) | BLEU28.45 | 237 | |
| Long-term time-series forecasting | ETTh1 (test) | MSE0.114 | 221 | |
| Hallucination Detection | TriviaQA (test) | AUC-ROC42 | 169 | |
| Machine Translation | IWSLT De-En 2014 (test) | BLEU29.98 | 146 | |
| Multimodal Machine Translation | Multi30K (test) | BLEU-433.7 | 139 | |
| Speech Recognition | WSJ (92-eval) | WER16 | 131 | |
| Scene Text Recognition | SVT 647 (test) | Accuracy85.9 | 101 |