| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| TriviaQA | Exact Match71.58 | 20 | 1mo ago | ||
| MMLU Redux | MMLU Redux Score93.3 | 15 | 21d ago | ||
| ARC-Challenge, ENEM, BLUEX, OAB Exams, BELEBELE, MMLU, GSM8K-PT | Tucano2-qwen-3.7B-Instruct | K&R Score (NPM)56.22 | 14 | 3mo ago | |
| MMLU-Redux | Sigma-MoE-Tiny | Accuracy79.8 | 9 | 3mo ago | |
| MedXpertQA text + MM | Gemini 3 Pro | Accuracy (MedXpertQA Text+MM)74.4 | 6 | 1mo ago | |
| MMMU Pro | Score81 | 5 | 2mo ago | ||
| CCPM | Engram-40B | Accuracy87.7 | 4 | 3mo ago | |
| PopQA | Engram-40B | Exact Match21.2 | 4 | 3mo ago | |
| TriviaQA-ZH | Engram-40B | EM77.9 | 4 | 3mo ago | |
| CMMLU | Engram-40B | Accuracy63.4 | 4 | 3mo ago |