Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multilingual Multiple-Choice Reasoning on INCLUDE 44 languages 1.0 (test)
Loading...
56.9
Average Accuracy
Qwen3.5-4B
38.7
43.425
48.15
52.875
Mar 12, 2026
Average Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Average Accuracy
Qwen3.5-4B
evaluation harness=lm-...
2026.03
56.9
Ministral-3-3B
evaluation harness=lm-...
2026.03
52.6
Qwen3-4B
evaluation harness=lm-...
2026.03
52.2
Gemma3-4B
evaluation harness=lm-...
2026.03
48.9
Tiny Aya Global
evaluation harness=lm-...
2026.03
45.1
SmolLM3-3B
evaluation harness=lm-...
2026.03
39.4
Feedback
Search any
task
Search any
task