Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Text Smoothing: Enhance Various Data Augmentation Methods on Text Classification Tasks

About

Before entering the neural network, a token is generally converted to the corresponding one-hot representation, which is a discrete distribution of the vocabulary. Smoothed representation is the probability of candidate tokens obtained from a pre-trained masked language model, which can be seen as a more informative substitution to the one-hot representation. We propose an efficient data augmentation method, termed text smoothing, by converting a sentence from its one-hot representation to a controllable smoothed representation. We evaluate text smoothing on different benchmarks in a low-resource regime. Experimental results show that text smoothing outperforms various mainstream data augmentation methods by a substantial margin. Moreover, text smoothing can be combined with those data augmentation methods to achieve better performance.

Xing Wu, Chaochen Gao, Meng Lin, Liangjun Zang, Zhongyuan Wang, Songlin Hu• 2022

Related benchmarks

TaskDatasetResultRank
Text ClassificationAG-News
Accuracy89.9
248
Text ClassificationTREC
Accuracy95.8
179
Topic ClassificationYahoo
Accuracy68.9
42
Showing 3 of 3 rows

Other info

Follow for update