MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification

About

This paper presents MixText, a semi-supervised learning method for text classification, which uses our newly designed data augmentation method called TMix. TMix creates a large amount of augmented training samples by interpolating text in hidden space. Moreover, we leverage recent advances in data augmentation to guess low-entropy labels for unlabeled data, hence making them as easy to use as labeled data.By mixing labeled, unlabeled and augmented data, MixText significantly outperformed current pre-trained and fined-tuned models and other state-of-the-art semi-supervised learning methods on several text classification benchmarks. The improvement is especially prominent when supervision is extremely limited. We have publicly released our code at https://github.com/GT-SALT/MixText.

Jiaao Chen, Zichao Yang, Diyi Yang• 2020

Related benchmarks

Task	Dataset	Result
Text Classification	AG News (test)	Accuracy91.51	293
Text Classification	Yahoo! Answers (test)	Clean Accuracy74.1	133
Ontology Classification	DBPedia (test)	Accuracy99.2	53
Sentence Classification	Amazon Review (test)	Accuracy92.79	15

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord