| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MMLU | M2CL | Accuracy99.7 | 876 | 4d ago | |
| MMLU | Accuracy94.7 | 321 | 9d ago | ||
| MMLU | GPT-4 | MMLU Score86.4 | 112 | 24d ago | |
| MMLU | Accuracy73 | 111 | 1mo ago | ||
| MMLU (test) | Normalized Accuracy90.46 | 76 | 1mo ago | ||
| MMLU-Pro | Pass@192.85 | 64 | 4d ago | ||
| MMLU | CascadeDebate | MMLU Accuracy82.67 | 59 | 3d ago | |
| MMLU-Pro | HuggingGPT | Accuracy65.59 | 55 | 2d ago | |
| MMLU-Redux | HieraMAS | Accuracy95.2 | 44 | 1mo ago | |
| MMLU | Accuracy (5-shot)78.74 | 31 | 1mo ago | ||
| MMLU | CoT | Average Inference Time (s)2.07 | 30 | 24d ago | |
| MMLU Pro | Llama 3.1 Instruct | Accuracy66.3 | 28 | 4d ago | |
| MMLU-M | ZipCal | Accuracy28.79 | 26 | 1mo ago | |
| MMLU-Pro | T2 | Latency (s)3.3 | 24 | 1mo ago | |
| LM Evaluation Harness (test) | GQA | ARC Challenge Acc44.28 | 24 | 1mo ago | |
| CMMLU | Accuracy89.28 | 22 | 1mo ago | ||
| MMLU Pro (test) | NExt | History Score65.1 | 20 | 4d ago | |
| MMLU | MMLU Score60.87 | 20 | 9d ago | ||
| MMLU | DAAO | Accuracy84.9 | 20 | 1mo ago | |
| MMLU-Pro | DSMoE | Math Score87.5 | 16 | 10d ago | |
| MMLU-IT | Qwen3-30B-A3B | Accuracy81.5 | 16 | 1mo ago | |
| MMLU Pro | d2Cache | Throughput10.12 | 16 | 1mo ago | |
| MMLU-pro | PRD | Average Context Length (tokens)14,946.31 | 16 | 1mo ago | |
| MMLU | MMLU Accuracy92 | 14 | 9d ago | ||
| MMLU Pro | RL | Accuracy57.6 | 14 | 1mo ago |