Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Commonsense Reasoning on Hellaswag non-EU languages (test)
Loading...
80.4
Accuracy
Mistral-3.2-24B
28.296
41.823
55.35
68.877
Feb 5, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Mistral-3.2-24B
Release Type=Open-weig...
2026.02
80.4
Qwen-3-32B
Release Type=Open-weig...
2026.02
77.4
Qwen-3-30B-A3B
Release Type=Open-weig...
2026.02
76.6
Qwen-3-14B
Release Type=Open-weig...
2026.02
73.8
Gemma-3-27B
Release Type=Open-weig...
2026.02
73.6
Gemma-3-12B
Release Type=Open-weig...
2026.02
71.1
Llama-3.3-70B
Release Type=Open-weig...
2026.02
70.2
Apertus-70B
Release Type=Fully-ope...
2026.02
64.5
EuroLLM-22B (new)
Release Type=Fully-ope...
2026.02
61.7
EuroLLM-22B (old)
Release Type=Fully-ope...
2026.02
61.4
EuroLLM-9B (old)
Release Type=Fully-ope...
2026.02
50.1
Apertus-8B
Release Type=Fully-ope...
2026.02
48.7
EuroLLM-9B (new)
Release Type=Fully-ope...
2026.02
47.3
OLMo-3.1-32B
Release Type=Fully-ope...
2026.02
43.3
Llama-3.1-8B
Release Type=Open-weig...
2026.02
36.6
OLMo-3-7B
Release Type=Fully-ope...
2026.02
30.3
Feedback
Search any
task
Search any
task