Towards Efficient NLP: A Standard Evaluation and A Strong Baseline

About

Supersized pre-trained language models have pushed the accuracy of various natural language processing (NLP) tasks to a new state-of-the-art (SOTA). Rather than pursuing the reachless SOTA accuracy, more and more researchers start paying attention on model efficiency and usability. Different from accuracy, the metric for efficiency varies across different studies, making them hard to be fairly compared. To that end, this work presents ELUE (Efficient Language Understanding Evaluation), a standard evaluation, and a public leaderboard for efficient NLP models. ELUE is dedicated to depict the Pareto Frontier for various language understanding tasks, such that it can tell whether and how much a method achieves Pareto improvement. Along with the benchmark, we also release a strong baseline, ElasticBERT, which allows BERT to exit at any layer in both static and dynamic ways. We demonstrate the ElasticBERT, despite its simplicity, outperforms or performs on par with SOTA compressed and early exiting models. With ElasticBERT, the proposed ELUE has a strong Pareto Frontier and makes a better evaluation for efficient NLP models.

Xiangyang Liu, Tianxiang Sun, Junliang He, Jiawen Wu, Lingling Wu, Xinyu Zhang, Hao Jiang, Zhao Cao, Xuanjing Huang, Xipeng Qiu• 2021

Related benchmarks

Task	Dataset	Result
Natural Language Inference	SNLI (test)	Accuracy-2.7	694
Subjectivity Classification	Subj	Accuracy48.85	343
Sentiment Analysis	IMDB (test)	Accuracy-2.5	306
Question Classification	TREC	Accuracy17.6	274
Sentiment Analysis	MR	Accuracy0.4925	160
Sentiment Analysis	CR	Accuracy48.65	141
Sentiment Analysis	SST-5	Accuracy22.35	123
Natural Language Inference	SciTail (test)	Accuracy-0.1	86
Natural Language Understanding	GLUE (test)	QNLI93.6	75
Paraphrase Detection	QQP (test)	Accuracy-0.2	51

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord