Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Focus on the Target's Vocabulary: Masked Label Smoothing for Machine Translation

About

Label smoothing and vocabulary sharing are two widely used techniques in neural machine translation models. However, we argue that simply applying both techniques can be conflicting and even leads to sub-optimal performance. When allocating smoothed probability, original label smoothing treats the source-side words that would never appear in the target language equally to the real target-side words, which could bias the translation model. To address this issue, we propose Masked Label Smoothing (MLS), a new mechanism that masks the soft label probability of source-side words to zero. Simple yet effective, MLS manages to better integrate label smoothing with vocabulary sharing. Our extensive experiments show that MLS consistently yields improvement over original label smoothing on different datasets, including bilingual and multilingual translation from both translation quality and model's calibration. Our code is released at https://github.com/PKUnlp-icler/MLS

Liang Chen, Runxin Xu, Baobao Chang• 2022

Related benchmarks

TaskDatasetResultRank
Machine TranslationWMT En-De '14
BLEU27.91
89
Machine TranslationWMT De-En 14
BLEU31.43
33
Machine TranslationWMT EN-RO 2016
BLEU20.88
28
Machine TranslationIWSLT14 DE-EN
BLEU Score35.04
22
Machine TranslationCASIA ZH-EN
BLEU21.23
5
Machine TranslationIWSLT'14 + WMT'16 original (test)
BLEU (DE,RO->EN)34.1
2
Machine TranslationIWSLT'14 + WMT'16 Balanced (test)
BLEU (DE,RO->EN)33.53
2
Machine TranslationDE-EN
Inference ECE9.67
2
Machine TranslationVI-EN
Inference ECE12.63
2
Machine TranslationDE,RO-EN
Inference ECE11.37
2
Showing 10 of 11 rows

Other info

Follow for update