Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AEDA: An Easier Data Augmentation Technique for Text Classification

About

This paper proposes AEDA (An Easier Data Augmentation) technique to help improve the performance on text classification tasks. AEDA includes only random insertion of punctuation marks into the original text. This is an easier technique to implement for data augmentation than EDA method (Wei and Zou, 2019) with which we compare our results. In addition, it keeps the order of the words while changing their positions in the sentence leading to a better generalized performance. Furthermore, the deletion operation in EDA can cause loss of information which, in turn, misleads the network, whereas AEDA preserves all the input information. Following the baseline, we perform experiments on five different datasets for text classification. We show that using the AEDA-augmented data for training, the models show superior performance compared to using the EDA-augmented data in all five datasets. The source code is available for further study and reproduction of the results.

Akbar Karimi, Leonardo Rossi, Andrea Prati• 2021

Related benchmarks

TaskDatasetResultRank
Question AnsweringSQuAD v1.1 (dev)
F1 Score32.68
375
Question AnsweringNewsQA (dev)
F1 Score61.78
101
Sequence ClassificationATIS
Micro F197.63
64
Sequence ClassificationIMDB
Micro F189.33
64
Sequence ClassificationMASSIVE
Micro F179.11
64
Sequence ClassificationYahoo
Micro F156.02
64
Sequence ClassificationHuffpost low-resource (test)
Micro F181.1
64
Paraphrase DetectionQQP (test)--
51
Sentence SimilarityMRPC (test)
F1 (micro)77.44
44
Text ClassificationSST2
Accuracy0.9176
10
Showing 10 of 14 rows

Other info

Follow for update