| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MATH-500, MMLU-Redux, and SimpleQA (Averaged) | gemini-2.5-pro | Accuracy82.57 | 53 | 1mo ago | |
| Open LLM Leaderboard (test) | Average Score70.1 | 21 | 1mo ago | ||
| AlignBench | Qwen2.5-14B | Reasoning Score7.27 | 20 | 1mo ago | |
| Instruction-Following, Mathematics, and Commonsense Reasoning Combined | Qwen2.5 7B-PC | Average Score57 | 18 | 1mo ago | |
| Overall | UM-190k | Overall Score38.74 | 9 | 1mo ago | |
| MT-Bench zh | Qwen2.5-14B | Overall Score6.66 | 7 | 1mo ago | |
| MT-Bench | Writing9.25 | 4 | 1mo ago | ||
| XSTest | MTSA-T3 | Refusal Rate23.1 | 4 | 1mo ago |