| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MMLU | Llama-3.3-70b-Instruct | Accuracy82.4 | 33 | 1mo ago | |
| MMLU, MMLU-pro, SuperGPQA, LPFQA | Pass@1 Score93.8 | 20 | 25d ago | ||
| JARVIS-VLA Benchmark 1.0 (test) | GPT-4o | Accuracy96.6 | 10 | 1mo ago | |
| MMLU | Sigma-MoE-Tiny Base | EM (World Knowledge)64.81 | 4 | 1mo ago | |
| HUMANITY’S LAST EXAM text-only | Score11.1 | 4 | 1mo ago | ||
| GPQA Diamond | Score77.5 | 4 | 1mo ago | ||
| MMLU-PRO | DeepSeek-V3.2 | Score84.6 | 4 | 1mo ago | |
| MMLU-Pro | Sigma-MoE-Tiny Base | EM (World Knowledge)38.13 | 3 | 1mo ago |