| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Standard Zero-shot NLU Suite (ARC-challenge, ARC-easy, BoolQ, HellaSwag, LAMBADA, PIQA, RACE, SciQ, Record, OBQA) | MaxScore | ARC Challenge21.22 | 18 | 3mo ago | |
| LM-Evaluation-Harness ARC, BoolQ, HellaSwag, LAMBADA, PIQA, RACE, SciQ, Record, OBQA | ARC Challenge46.8 | 13 | 3mo ago | ||
| NLU Benchmark Suite CMNLI, HeSW, PIQA, WSC, CoQA, BoolQ, Race-M, Race-H, XSum, C3 | LaCo* | CMNLI Accuracy34.43 | 8 | 1mo ago | |
| T0 (test) | T0-3B | Accuracy65.5 | 8 | 3mo ago |