SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
About
In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. The GLUE benchmark, introduced a little over one year ago, offers a single-number metric that summarizes progress on a diverse set of such tasks, but performance on the benchmark has recently surpassed the level of non-expert humans, suggesting limited headroom for further research. In this paper we present SuperGLUE, a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard. SuperGLUE is available at super.gluebenchmark.com.
Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman• 2019
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Natural Language Understanding | SuperGLUE | SGLUE Score89.8 | 84 | |
| Natural Language Understanding | SuperGLUE (test) | BoolQ Accuracy89 | 63 | |
| Word Sense Disambiguation | WiC v1.0 (test) | Accuracy69.6 | 19 | |
| Word Sense Disambiguation | SemEval-SS standardized (test) | Accuracy81.1 | 8 |
Showing 4 of 4 rows