Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification

About

This paper presents MixText, a semi-supervised learning method for text classification, which uses our newly designed data augmentation method called TMix. TMix creates a large amount of augmented training samples by interpolating text in hidden space. Moreover, we leverage recent advances in data augmentation to guess low-entropy labels for unlabeled data, hence making them as easy to use as labeled data.By mixing labeled, unlabeled and augmented data, MixText significantly outperformed current pre-trained and fined-tuned models and other state-of-the-art semi-supervised learning methods on several text classification benchmarks. The improvement is especially prominent when supervision is extremely limited. We have publicly released our code at https://github.com/GT-SALT/MixText.

Jiaao Chen, Zichao Yang, Diyi Yang• 2020

Related benchmarks

TaskDatasetResultRank
Text ClassificationAG News (test)
Accuracy91.51
210
Text ClassificationYahoo! Answers (test)
Clean Accuracy74.1
133
Ontology ClassificationDBPedia (test)
Accuracy99.2
53
Sentence ClassificationAmazon Review (test)
Accuracy92.79
15
Showing 4 of 4 rows

Other info

Follow for update