Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation

About

Existing multilingual machine translation approaches mainly focus on English-centric directions, while the non-English directions still lag behind. In this work, we aim to build a many-to-many translation system with an emphasis on the quality of non-English language directions. Our intuition is based on the hypothesis that a universal cross-language representation leads to better multilingual translation performance. To this end, we propose mRASP2, a training method to obtain a single unified multilingual translation model. mRASP2 is empowered by two techniques: a) a contrastive learning scheme to close the gap among representations of different languages, and b) data augmentation on both multiple parallel and monolingual data to further align token representations. For English-centric directions, mRASP2 outperforms existing best unified model and achieves competitive or even better performance than the pre-trained and fine-tuned model mBART on tens of WMT's translation directions. For non-English directions, mRASP2 achieves an improvement of average 10+ BLEU compared with the multilingual Transformer baseline. Code, data and trained models are available at https://github.com/PANXiao1994/mRASP2.

Xiao Pan, Mingxuan Wang, Liwei Wu, Lei Li• 2021

Related benchmarks

TaskDatasetResultRank
Machine TranslationWMT En-Fr 2014 (test)
BLEU43.5
237
Machine TranslationWMT16 EN-RO (test)
BLEU39.1
56
Machine TranslationOPUS-100 (test)
Average BLEU Score15.31
19
Machine TranslationWMT En-Tr 17
BLEU25.8
17
Machine TranslationOPUS-7 (test)
Translation Score (X -> Ar)74.36
17
Machine TranslationIWSLT 2017 (test)
De-It Translation Score77.01
15
Machine TranslationWMT En-Fi 17 (test)
BLEU (tokenized)30.1
14
Machine TranslationPC-6 (test)
Translation Score (x -> Cs)68.98
13
Machine TranslationWMT En-Es 13 (test)
Tokenized BLEU35
10
Machine TranslationWMT En-Tr 17 (test)
BLEU (tokenized)21.4
6
Showing 10 of 22 rows

Other info

Code

Follow for update