| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MQA | Phi-4-14B | First-Token Accuracy45 | 24 | 13d ago | |
| SciQ | Llama-3.1-8B | First-Token Accuracy98.3 | 24 | 13d ago | |
| ARC-C 24 official EU languages | Qwen-3-32B | Score93.1 | 14 | 1mo ago | |
| GPQA Main | Qwen3-4B-Inst-2507 | Accuracy0.181 | 5 | 1mo ago |