| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Open LLM Leaderboard | DPO | ARC82.8 | 33 | 1mo ago | |
| Multi-benchmark Suite (AGIEval, GSM8K, MATH, Natural Questions, SimpleQA, TriviaQA, SuperGPQA) (cumulative) | MTP-D | AGIEval (EN)90.98 | 20 | 23d ago | |
| BigBench (Lamb, SQuAD, CoQA, BBH, LSAT, LangID) | KromHC | Avg Score24 | 8 | 1mo ago | |
| LLM Evaluation Suite ARC, BBH, HellaSwag, TruthfulQA, LAMBADA, WinoGrande, GSM8K, MT-Bench | BitDelta | ARC (Accuracy)54.61 | 3 | 1mo ago |