Mask-Predict: Parallel Decoding of Conditional Masked Language Models

About

Most machine translation systems generate text autoregressively from left to right. We, instead, use a masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and a partially masked target translation. This approach allows for efficient iterative decoding, where we first predict all of the target words non-autoregressively, and then repeatedly mask out and regenerate the subset of words that the model is least confident about. By applying this strategy for a constant number of iterations, our model improves state-of-the-art performance levels for non-autoregressive and parallel decoding translation models by over 4 BLEU on average. It is also able to reach within about 1 BLEU point of a typical left-to-right transformer model, while decoding significantly faster.

Marjan Ghazvininejad, Omer Levy, Yinhan Liu, Luke Zettlemoyer• 2019

Related benchmarks

Task	Dataset	Result
Machine Translation	WMT En-De 2014 (test)	BLEU27.03	379
Machine Translation	IWSLT De-En 2014 (test)	BLEU33.4	146
Machine Translation	WMT 2014 (test)	BLEU30.86	100
Machine Translation	IWSLT En-De 2014 (test)	BLEU22	92
Machine Translation	WMT En-De '14	BLEU18.12	89
Machine Translation	WMT Ro-En 2016 (test)	BLEU33.31	84
Machine Translation	WMT14 En-De newstest2014 (test)	BLEU27.03	65
Machine Translation	WMT De-En 14 (test)	BLEU30.53	59
Machine Translation	WMT 2016 (test)	BLEU33.06	58
Machine Translation	WMT16 EN-RO (test)	BLEU33.08	56

Showing 10 of 38 rows

Other info

Follow for update

@wizwand_team Discord