Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multilingual Multiple-Choice Reasoning on Global MMLU 42 languages 1.0 (test)
Loading...
54.8
Average Accuracy
Qwen3.5-4B
36.496
41.248
46
50.752
Mar 12, 2026
Average Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Average Accuracy
Qwen3.5-4B
evaluation harness=lm-...
2026.03
54.8
Qwen3-4B
evaluation harness=lm-...
2026.03
49.3
Ministral-3-3B
evaluation harness=lm-...
2026.03
46.8
Gemma3-4B
evaluation harness=lm-...
2026.03
45.3
Tiny Aya Global
evaluation harness=lm-...
2026.03
44.9
SmolLM3-3B
evaluation harness=lm-...
2026.03
37.2
Feedback
Search any
task
Search any
task