| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MATH-500, MMLU-Redux, and SimpleQA (Averaged) | gemini-2.5-pro | Accuracy82.57 | 53 | 2mo ago | |
| Open LLM Leaderboard (test) | Average Score70.1 | 21 | 3mo ago | ||
| AlignBench | Qwen2.5-14B | Reasoning Score7.27 | 20 | 3mo ago | |
| Instruction-Following, Mathematics, and Commonsense Reasoning Combined | Qwen2.5 7B-PC | Average Score57 | 18 | 2mo ago | |
| Overall | UM-190k | Overall Score38.74 | 9 | 3mo ago | |
| MT-Bench zh | Qwen2.5-14B | Overall Score6.66 | 7 | 3mo ago | |
| MT-Bench | AdaDPO | Overall Score8.03 | 7 | 6d ago | |
| XSTest | MTSA-T3 | Refusal Rate23.1 | 4 | 3mo ago |