Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mask-Predict: Parallel Decoding of Conditional Masked Language Models

About

Most machine translation systems generate text autoregressively from left to right. We, instead, use a masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and a partially masked target translation. This approach allows for efficient iterative decoding, where we first predict all of the target words non-autoregressively, and then repeatedly mask out and regenerate the subset of words that the model is least confident about. By applying this strategy for a constant number of iterations, our model improves state-of-the-art performance levels for non-autoregressive and parallel decoding translation models by over 4 BLEU on average. It is also able to reach within about 1 BLEU point of a typical left-to-right transformer model, while decoding significantly faster.

Marjan Ghazvininejad, Omer Levy, Yinhan Liu, Luke Zettlemoyer• 2019

Related benchmarks

TaskDatasetResultRank
Machine TranslationWMT En-De 2014 (test)
BLEU27.03
379
Machine TranslationIWSLT De-En 2014 (test)
BLEU33.4
146
Machine TranslationWMT 2014 (test)
BLEU30.86
100
Machine TranslationIWSLT En-De 2014 (test)
BLEU22
92
Machine TranslationWMT En-De '14
BLEU18.12
89
Machine TranslationWMT Ro-En 2016 (test)
BLEU33.31
82
Machine TranslationWMT14 En-De newstest2014 (test)
BLEU27.03
65
Machine TranslationWMT De-En 14 (test)
BLEU30.53
59
Machine TranslationWMT 2016 (test)
BLEU33.06
58
Machine TranslationWMT16 EN-RO (test)
BLEU33.08
56
Showing 10 of 36 rows

Other info

Follow for update