FlipDA: Effective and Robust Data Augmentation for Few-Shot Learning
About
Most previous methods for text data augmentation are limited to simple tasks and weak baselines. We explore data augmentation on hard tasks (i.e., few-shot natural language understanding) and strong baselines (i.e., pretrained models with over one billion parameters). Under this setting, we reproduced a large number of previous augmentation methods and found that these methods bring marginal gains at best and sometimes degrade the performance much. To address this challenge, we propose a novel data augmentation method FlipDA that jointly uses a generative model and a classifier to generate label-flipped data. Central to the idea of FlipDA is the discovery that generating label-flipped data is more crucial to the performance than generating label-preserved data. Experiments show that FlipDA achieves a good tradeoff between effectiveness and robustness -- it substantially improves many tasks while not negatively affecting the others.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Sentiment Analysis | SST-2 (test) | Accuracy94.3 | 136 | |
| Commonsense Question Answering | CSQA (test) | Accuracy0.77 | 127 | |
| Natural Language Inference | MNLI (matched) | Accuracy68.8 | 110 | |
| Topic Classification | AG News (test) | Accuracy85.2 | 98 | |
| Natural Language Inference | MNLI (mismatched) | Accuracy68.9 | 68 | |
| Aspect-based Sentiment Analysis | SemEval Restaurant 2014 (All) | F1 Score51.38 | 19 | |
| Aspect-based Sentiment Analysis | SemEval Laptop 2014 | F1 Score32.81 | 19 | |
| Natural Language Understanding | SuperGLUE few-shot | BoolQ Accuracy0.818 | 16 | |
| Emotion Classification | TweetEmo (test) | Accuracy76.7 | 13 | |
| Conditional Text Generation | CommonGen | ROUGE-146.81 | 6 |