| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MMLU | Llama-3.3-70b-Instruct | Accuracy82.4 | 39 | 8d ago | |
| MMLU, MMLU-pro, SuperGPQA, LPFQA | Pass@1 Score93.8 | 20 | 2mo ago | ||
| JARVIS-VLA Benchmark 1.0 (test) | GPT-4o | Accuracy96.6 | 10 | 3mo ago | |
| KVFundaBench WK | Accuracy72.99 | 5 | 23d ago | ||
| MMLU | Sigma-MoE-Tiny Base | EM (World Knowledge)64.81 | 4 | 3mo ago | |
| HUMANITY’S LAST EXAM text-only | Score11.1 | 4 | 3mo ago | ||
| GPQA Diamond | Score77.5 | 4 | 3mo ago | ||
| MMLU-PRO | DeepSeek-V3.2 | Score84.6 | 4 | 3mo ago | |
| MMLU-Pro | Sigma-MoE-Tiny Base | EM (World Knowledge)38.13 | 3 | 3mo ago | |
| ARC-c | SYNPRO | Accuracy (ARC-c)41.47 | 1 | 15d ago | |
| ARC-e | SYNPRO | Accuracy70.18 | 1 | 15d ago |