| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MMLU | M2CL | Accuracy99.7 | 881 | 1mo ago | |
| MMLU | MMLU Accuracy98.5 | 442 | 20h ago | ||
| MMLU | Accuracy94.7 | 353 | 1mo ago | ||
| MMLU | Accuracy77.6 | 136 | 7d ago | ||
| MMLU | GPT-4 | MMLU Score86.4 | 116 | 6d ago | |
| MMLU (test) | Normalized Accuracy90.46 | 87 | 4d ago | ||
| MMLU | QWEN-3-30B MOE | MMLU Score80.2 | 86 | 5d ago | |
| MMLU-Pro | SMCS | Accuracy82.05 | 64 | 15d ago | |
| MMLU-Pro | Pass@192.85 | 64 | 1mo ago | ||
| MMLU Pro | Accuracy96.8 | 57 | 22d ago | ||
| MMLU-Redux | HieraMAS | Accuracy95.2 | 48 | 14d ago | |
| MMLU Pro | Qwen3-Next-80B-A3B-Thinking | MMLU Pro Engineering Acc76.88 | 41 | 14d ago | |
| MMLUpro (test) | SIGMA | Accuracy95.71 | 36 | 6d ago | |
| MMLU | Accuracy (5-shot)78.74 | 31 | 3mo ago | ||
| MMLU | CoT | Average Inference Time (s)2.07 | 30 | 2mo ago | |
| MMLU-M | ZipCal | Accuracy28.79 | 26 | 2mo ago | |
| MMLU-Pro | VecCISC + KMeans | Best Accuracy71.4 | 25 | 22d ago | |
| MMLU-Pro | T2 | Latency (s)3.3 | 24 | 3mo ago | |
| CMMLU | Accuracy89.28 | 24 | 1mo ago | ||
| LM Evaluation Harness (test) | GQA | ARC Challenge Acc44.28 | 24 | 3mo ago | |
| MMSU | SALAD-7B | Accuracy71.6 | 23 | 25d ago | |
| MMLU | TAD | PRR62.5 | 22 | 1mo ago | |
| CEval | Accuracy82.5 | 22 | 20h ago | ||
| MMLU | MONA | MMLU Score63.73 | 21 | 4d ago | |
| MMLU | Seq. Post. | Error Rate (ER)0.087 | 21 | 12d ago |