Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Unsupervised Statistical Machine Translation

About

While modern machine translation has relied on large parallel corpora, a recent line of work has managed to train Neural Machine Translation (NMT) systems from monolingual corpora only (Artetxe et al., 2018c; Lample et al., 2018). Despite the potential of this approach for low-resource settings, existing systems are far behind their supervised counterparts, limiting their practical interest. In this paper, we propose an alternative approach based on phrase-based Statistical Machine Translation (SMT) that significantly closes the gap with supervised systems. Our method profits from the modular architecture of SMT: we first induce a phrase table from monolingual corpora through cross-lingual embedding mappings, combine it with an n-gram language model, and fine-tune hyperparameters through an unsupervised MERT variant. In addition, iterative backtranslation improves results further, yielding, for instance, 14.08 and 26.22 BLEU points in WMT 2014 English-German and English-French, respectively, an improvement of more than 7-10 BLEU points over previous unsupervised systems, and closing the gap with supervised SMT (Moses trained on Europarl) down to 2-5 BLEU points. Our implementation is available at https://github.com/artetxem/monoses

Mikel Artetxe, Gorka Labaka, Eneko Agirre• 2018

Related benchmarks

TaskDatasetResultRank
Machine TranslationWMT En-De 2014 (test)
BLEU14.08
379
Machine TranslationWMT En-Fr 2014 (test)
BLEU26.22
237
Machine TranslationWMT 2014 (test)
BLEU26.22
100
Machine TranslationWMT16 English-German (test)
BLEU18.2
58
Machine TranslationWMT 2016 (test)
BLEU18.23
58
Machine TranslationWMT16 German-English (test)
BLEU23.1
39
Machine TranslationWMT en-de 2016 (newstest)
BLEU18.23
9
Machine Translation (De-En)WMT 2016 (test)
BLEU23.05
9
Text SimplificationWikipedia-SimpleWikipedia (test)
FE-diff13.84
9
Machine Translation (De-En)WMT 2014 (test)
BLEU17.43
8
Showing 10 of 12 rows

Other info

Code

Follow for update