| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MMLU | M2CL | Accuracy99.7 | 842 | 3d ago | |
| MMLU | Accuracy73 | 101 | 2d ago | ||
| MMLU | SA-SFT | Accuracy74.4 | 87 | 3d ago | |
| MMLU (test) | Normalized Accuracy90.46 | 76 | 3d ago | ||
| MMLU | Accuracy (5-shot)78.74 | 31 | 3d ago | ||
| MMLU | GPT-4 | MMLU Score86.4 | 28 | 3d ago | |
| MMLU-Pro | T2 | Latency (s)3.3 | 24 | 3d ago | |
| LM Evaluation Harness (test) | GQA | ARC Challenge Acc44.28 | 24 | 3d ago | |
| MMLU-Redux | HieraMAS | Accuracy95.2 | 22 | 3d ago | |
| CMMLU | Accuracy89.28 | 22 | 3d ago | ||
| MMLU-Pro | Pass@157.8 | 21 | 3d ago | ||
| MMLU | DAAO | Accuracy84.9 | 20 | 3d ago | |
| MMLU Pro | d2Cache | Throughput10.12 | 16 | 3d ago | |
| MMLU-pro | PRD | Average Context Length (tokens)14,946.31 | 16 | 3d ago | |
| MMLU-Pro | DIVER | Accuracy56.6 | 14 | 3d ago | |
| MMLU | LESS | MMLU Score64.29 | 14 | 3d ago | |
| MMLU | MMLU Score69.5 | 14 | 3d ago | ||
| MMLU Pro | Qwen3-Next-80B-A3B-Thinking | MMLU Pro Engineering Acc76.88 | 13 | 3d ago | |
| CEval | DeepSeek Chat 7B | Accuracy44.7 | 13 | 3d ago | |
| MMLU-Redux (unseen categories) | Accuracy72 | 10 | 3d ago | ||
| MMLU | MobileFineTuner | Loss (FT)1.33 | 10 | 3d ago | |
| Global MMLU-Lite Māori | Gemma-3 (27B-IT) | Accuracy54.64 | 10 | 2d ago | |
| BIG-bench | Gopher | Hindu Knowledge80 | 10 | 2d ago | |
| MMLU | P(True) | Comparable SC Samples47 | 8 | 2d ago | |
| AVG Across All Benchmarks | d2Cache | Throughput12.89 | 8 | 3d ago |