Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Hardened Language Understanding on MMLU-Pro (val)
Loading...
38.6
Accuracy
Single Best
26.64
29.745
32.85
35.955
May 28, 2026
Accuracy
Updated 5d ago
Evaluation Results
Method
Method
Links
Accuracy
Single Best
Base model=Qwen2.5-1.5B
2026.05
38.6
CMA
Base model=Qwen2.5-1.5B
2026.05
35.7
Model Swarm
Base model=Qwen2.5-1.5B
2026.05
34.3
TIES
Base model=Qwen2.5-1.5B
2026.05
32.9
PSO-Merging
Base model=Qwen2.5-1.5B
2026.05
31.4
EvoGM
Base model=Qwen2.5-1.5B
2026.05
31.4
Model Soup
Base model=Qwen2.5-1.5B
2026.05
30
DARE
Base model=Qwen2.5-1.5B
2026.05
30
MTL
Base model=Qwen2.5-1.5B
2026.05
28.6
Task Arithmetic
Base model=Qwen2.5-1.5B
2026.05
28.6
Base
Base model=Qwen2.5-1.5B
2026.05
27.1
Feedback
Search any
task
Search any
task