Data Augmentation for Low-Resource Neural Machine Translation
About
The quality of a Neural Machine Translation system depends substantially on the availability of sizable parallel corpora. For low-resource language pairs this is not the case, resulting in poor translation quality. Inspired by work in computer vision, we propose a novel data augmentation approach that targets low-frequency words by generating new sentence pairs containing rare words in new, synthetically created contexts. Experimental results on simulated low-resource settings show that our method improves translation quality by up to 2.9 BLEU points over the baseline and up to 3.2 BLEU over back-translation.
Marzieh Fadaee, Arianna Bisazza, Christof Monz• 2017
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Few-shot Text Classification | 26 few-shot tasks Class -> Non-Class transfer setting (test) | Accuracy43.8 | 84 | |
| Few-shot Text Classification | 26 few-shot tasks Class -> Class transfer setting (test) | Accuracy46.23 | 84 | |
| Few-shot Text Classification | 26 few-shot tasks Non-Class -> Class transfer setting (test) | Accuracy0.4739 | 84 | |
| Few-shot Text Classification | 26 few-shot tasks Random -> Random transfer setting (test) | Accuracy44.41 | 84 | |
| Natural Language Understanding | SuperGLUE few-shot | BoolQ Accuracy0.7992 | 16 |
Showing 5 of 5 rows