| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MMLU, GSM8k, HellaSwag, WinoGrande | MMLU Accuracy72.98 | 17 | 1mo ago | ||
| Open LLM Leaderboard | ARC70.22 | 14 | 1mo ago | ||
| LM Evaluation Harness | TransMLA | ARC53.77 | 11 | 1mo ago | |
| TinyStories | go-mHC | Grammar6.63 | 5 | 15d ago | |
| Eight benchmark LLM tasks | Heterogeneous Digital-AIMC framework | Throughput (Tokens/s)49,781.23 | 5 | 1mo ago | |
| Bolmo 1B evaluation suite | BLT 1B | Overall Average Score58.5 | 5 | 1mo ago | |
| ARC, HellaSwag, MMLU, TruthfulQA, WinoGrande | BOFT | ARC Accuracy34.64 | 4 | 1mo ago |