| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MMMLU (Massive Multilingual Language Understanding) | Qwen-3-30B-A3B | Accuracy79.5 | 21 | 4d ago | |
| MMMLU | Task Arithmetic | Accuracy (Korean)60.5 | 20 | 4d ago | |
| MMLU-ProX | Qwen-3-30B-A3B | Accuracy72 | 16 | 4d ago | |
| M-MMLU (test) | REFLECT | zh Accuracy65.2 | 14 | 3d ago | |
| French (HellaSwag, ARC-Challenge, XNLI, and MMLU) translated (test) | GenKnowSub | HellaSwag Accuracy57.83 | 8 | 4d ago | |
| German (HellaSwag, ARC-Challenge, XNLI, and MMLU) translated (test) | Phi-3 | HellaSwag52.48 | 8 | 4d ago | |
| Portuguese | Qwen2.5 | Average Performance64.6 | 6 | 4d ago | |
| French | Qwen2.5 | Average Performance55.5 | 6 | 4d ago | |
| Spanish | Gamayun | Average Performance55.7 | 6 | 4d ago | |
| German | Qwen2.5 | Average Performance55.7 | 6 | 4d ago | |
| Bulgarian | Gamayun | Average Performance48.4 | 6 | 4d ago | |
| Arabic | Gamayun | Average Performance0.572 | 6 | 4d ago | |
| Russian | Gamayun | Arc-ru34.9 | 6 | 4d ago | |
| Chinese | Qwen2.5 | Average Performance68.7 | 5 | 4d ago | |
| Thai | Qwen2.5 | Avg Performance54.5 | 5 | 4d ago | |
| Multilingual Understanding | Qwen2-72B | Accuracy80.7 | 5 | 4d ago | |
| INCLUDE 5-shot | ERNIE 5.0-Base | Accuracy77.81 | 3 | 4d ago | |
| MMMLU 5-shot | ERNIE 5.0-Base | Accuracy78.94 | 3 | 4d ago |