Empowering Large Language Models for Textual Data Augmentation

About

With the capabilities of understanding and executing natural language instructions, Large language models (LLMs) can potentially act as a powerful tool for textual data augmentation. However, the quality of augmented data depends heavily on the augmentation instructions provided, and the effectiveness can fluctuate across different downstream tasks. While manually crafting and selecting instructions can offer some improvement, this approach faces scalability and consistency issues in practice due to the diversity of downstream tasks. In this work, we address these limitations by proposing a new solution, which can automatically generate a large pool of augmentation instructions and select the most suitable task-informed instructions, thereby empowering LLMs to create high-quality augmented data for different downstream tasks. Empirically, the proposed approach consistently generates augmented data with better quality compared to non-LLM and LLM-based data augmentation methods, leading to the best performance on 26 few-shot learning tasks sourced from a wide range of application domains.

Yichuan Li, Kaize Ding, Jianling Wang, Kyumin Lee• 2024

Related benchmarks

Task	Dataset	Result
Few-shot Text Classification	26 few-shot tasks Class -> Class transfer setting (test)	Accuracy54.98	84
Few-shot Text Classification	26 few-shot tasks Non-Class -> Class transfer setting (test)	Accuracy0.5275	84
Few-shot Text Classification	26 few-shot tasks Random -> Random transfer setting (test)	Accuracy48.95	84
Few-shot Text Classification	26 few-shot tasks Class -> Non-Class transfer setting (test)	Accuracy43.8	84
Text Classification	Class -> Class	Accuracy0.5498	10
Text Classification	Non-Class -> Class	Accuracy52.75	10
NLP Tasks	Consolidated NLP Tasks (eval)	Score (Single Best Aug)47.8	9
Text Classification	Unspecified Dataset Class -> Non-Class	Accuracy42.8	8
Text Classification	Random -> Random	Accuracy48.83	8

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord