Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Reasoning on M-HellaSwag 30 languages
Loading...
49.29
Macro Accuracy
Llama 3.1
48.406
48.6355
48.865
49.0945
May 21, 2026
Macro Accuracy
Updated 12d ago
Evaluation Results
Method
Method
Links
Macro Accuracy
Llama 3.1
Base Model=8B Instruct
2026.05
49.29
Cross-Lingual Consensus
Base Model=Llama 3.1 8...
2026.05
48.44
Feedback
Search any
task
Search any
task