| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MMLU-Pro | Anchored Learning | Accuracy34.3 | 43 | 27d ago | |
| GPQA D | Length Scaling | Accuracy69.9 | 33 | 2mo ago | |
| GPQA | Accuracy85.7 | 25 | 3mo ago | ||
| MMLU-Pro (test) | Base | Accuracy41.9 | 18 | 27d ago | |
| GPQA Diamond | Accuracy63.1 | 15 | 19d ago | ||
| ARC-Challenge, ARC-Easy, OpenBookQA | MobileMoE-L | ARC-C Accuracy57.9 | 13 | 7d ago | |
| GPQA | CapFlow | Solve Rate42.41 | 11 | 3mo ago | |
| ARC-E | MobileMoE-L | Accuracy81.7 | 7 | 7d ago | |
| OBQA | MobileLLM-Pro | Accuracy42.8 | 4 | 7d ago | |
| ARC-C | MobileMoE-L | Accuracy55.1 | 4 | 7d ago | |
| GPQA D | AIPO | Avg@472.55 | 4 | 22d ago | |
| GPQA Diamond | DeepSeek-R1 | Pass@1 Score71.5 | 4 | 1mo ago | |
| ChemBench | AgentSPEX | Score83.3 | 3 | 1mo ago | |
| StemEZ physical chemistry MMLU-Pro | AgentSPEX | Score86.57 | 3 | 1mo ago | |
| SciBench chemistry | AgentSPEX | Score90.61 | 3 | 1mo ago |