Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PromDA: Prompt-based Data Augmentation for Low-Resource NLU Tasks

About

This paper focuses on the Data Augmentation for low-resource Natural Language Understanding (NLU) tasks. We propose Prompt-based D}ata Augmentation model (PromDA) which only trains small-scale Soft Prompt (i.e., a set of trainable vectors) in the frozen Pre-trained Language Models (PLMs). This avoids human effort in collecting unlabeled in-domain data and maintains the quality of generated synthetic data. In addition, PromDA generates synthetic data via two different views and filters out the low-quality data using NLU models. Experiments on four benchmarks show that synthetic data produced by PromDA successfully boost up the performance of NLU models which consistently outperform several competitive baseline models, including a state-of-the-art semi-supervised model using unlabeled in-domain data. The synthetic data from PromDA are also complementary with unlabeled in-domain data. The NLU models can be further improved when they are combined for training.

Yufei Wang, Can Xu, Qingfeng Sun, Huang Hu, Chongyang Tao, Xiubo Geng, Daxin Jiang• 2022

Related benchmarks

TaskDatasetResultRank
Named Entity RecognitionCoNLL 03
F1 (Entity)82.14
102
Named Entity RecognitionOntoNotes
F1-score57.64
91
Sequence ClassificationYahoo
Micro F156.27
64
Sequence ClassificationATIS
Micro F196.95
64
Sequence ClassificationHuffpost low-resource (test)
Micro F181.06
64
Sequence ClassificationMASSIVE
Micro F176.87
64
Sequence ClassificationIMDB
Micro F188.65
64
Named Entity RecognitionMultiCoNER
F1 Score0.5502
48
Showing 8 of 8 rows

Other info

Follow for update