Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach

About

Zero-shot text classification (0Shot-TC) is a challenging NLU problem to which little attention has been paid by the research community. 0Shot-TC aims to associate an appropriate label with a piece of text, irrespective of the text domain and the aspect (e.g., topic, emotion, event, etc.) described by the label. And there are only a few articles studying 0Shot-TC, all focusing only on topical categorization which, we argue, is just the tip of the iceberg in 0Shot-TC. In addition, the chaotic experiments in literature make no uniform comparison, which blurs the progress. This work benchmarks the 0Shot-TC problem by providing unified datasets, standardized evaluations, and state-of-the-art baselines. Our contributions include: i) The datasets we provide facilitate studying 0Shot-TC relative to conceptually different and diverse aspects: the ``topic'' aspect includes ``sports'' and ``politics'' as labels; the ``emotion'' aspect includes ``joy'' and ``anger''; the ``situation'' aspect includes ``medical assistance'' and ``water shortage''. ii) We extend the existing evaluation setup (label-partially-unseen) -- given a dataset, train on some labels, test on all labels -- to include a more challenging yet realistic evaluation label-fully-unseen 0Shot-TC (Chang et al., 2008), aiming at classifying text snippets without seeing task specific training data at all. iii) We unify the 0Shot-TC of diverse aspects within a textual entailment formulation and study it this way. Code & Data: https://github.com/yinwenpeng/BenchmarkingZeroShot

Wenpeng Yin, Jamaal Hay, Dan Roth• 2019

Related benchmarks

Task	Dataset	Result
Sentiment Analysis	IMDB (test)	Accuracy91.1	306
Intent Classification	Banking77 (test)	Accuracy42.2	196
Sentiment Analysis	SST-5 (test)	Accuracy48.8	177
Topic Classification	AG News (test)	Accuracy78	116
Topic Classification	DBPedia (test)	Accuracy73	64
Sentiment Classification	Yelp (test)	Accuracy73.5	46
Intent Classification	Snips (test)	Accuracy61.4	40
Topic Classification	Yahoo (test)	Accuracy48.2	36
Sentiment Analysis	Yelp (test)	Accuracy75.2	29
Sentiment Analysis	Financial Phrase Bank (test)	Accuracy0.402	24

Showing 10 of 33 rows

Other info

Follow for update

@wizwand_team Discord