Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multitask Language Understanding on MMLU-ProX non-EU languages (test)
Loading...
70.9
Accuracy
Qwen-3-30B-A3B
26.18
37.79
49.4
61.01
Feb 5, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen-3-30B-A3B
Release Type=Open-weig...
2026.02
70.9
Qwen-3-32B
Release Type=Open-weig...
2026.02
69
Qwen-3-14B
Release Type=Open-weig...
2026.02
64.6
Llama-3.3-70B
Release Type=Open-weig...
2026.02
63.4
Mistral-3.2-24B
Release Type=Open-weig...
2026.02
62.5
Gemma-3-27B
Release Type=Open-weig...
2026.02
58.8
OLMo-3.1-32B
Release Type=Fully-ope...
2026.02
55.1
Gemma-3-12B
Release Type=Open-weig...
2026.02
51.8
EuroLLM-22B (new)
Release Type=Fully-ope...
2026.02
43.8
OLMo-3-7B
Release Type=Fully-ope...
2026.02
40.5
EuroLLM-9B (new)
Release Type=Fully-ope...
2026.02
36.5
EuroLLM-22B (old)
Release Type=Fully-ope...
2026.02
36.4
Apertus-70B
Release Type=Fully-ope...
2026.02
35.2
Llama-3.1-8B
Release Type=Open-weig...
2026.02
31.1
Apertus-8B
Release Type=Fully-ope...
2026.02
28.6
EuroLLM-9B (old)
Release Type=Fully-ope...
2026.02
27.9
Feedback
Search any
task
Search any
task