| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LM Evaluation Harness 0-shot | LO-BCQ | WG80.66 | 30 | 4d ago | |
| BLiMP (test) | SwitchHead | Accuracy79.6 | 8 | 4d ago | |
| Prominent Language Benchmarks (ARC, BoolQ, HellaSwag, OpenBookQA, PIQA, SciQ, TriviaQA, Winogrande) | Xmodel-LM 1.1B | ARC-Challenge Acc28.16 | 5 | 4d ago | |
| CBT (test) | SwitchHead MAC-matched | Accuracy84.2 | 4 | 4d ago | |
| Perplexity-based tasks (Wikitext, LAMBADA) zero-shot | Wikitext Perplexity25.46 | 2 | 4d ago |