| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LM Evaluation Harness MMLU, ARC-Challenge, HellaSwag, TruthfulQA, Winogrande, GSM8K standard | MMLU65.8 | 16 | 4d ago | ||
| OpenLLM Leaderboard v1 (test) | SelectiveDPO | MMLU (5-shot)63.95 | 14 | 4d ago | |
| 15 Downstream Tasks summary | MPP-B | Median EG2 | 7 | 4d ago | |
| Downstream Tasks Aggregate | SPIRALFORMER-L | Accuracy54.37 | 3 | 4d ago | |
| MNLI, SCIQ, LAMBADA, HellaSwag, ARC, MMLU | FusedKV | MNLI Acc0.3852 | 2 | 4d ago |