Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Hardened Language Understanding on MMLU-Pro (test)
Loading...
23.4
Accuracy (MMLU-Pro Test)
Task Arithmetic
19.552
20.551
21.55
22.549
May 28, 2026
Accuracy (MMLU-Pro Test)
Updated 5d ago
Evaluation Results
Method
Method
Links
Accuracy (MMLU-Pro Test)
Task Arithmetic
Base model=Qwen2.5-1.5B
2026.05
23.4
DARE
Base model=Qwen2.5-1.5B
2026.05
23.3
Single Best
Base model=Qwen2.5-1.5B
2026.05
23.2
Model Swarm
Base model=Qwen2.5-1.5B
2026.05
22.9
Model Soup
Base model=Qwen2.5-1.5B
2026.05
22.5
EvoGM
Base model=Qwen2.5-1.5B
2026.05
22.5
TIES
Base model=Qwen2.5-1.5B
2026.05
22.1
CMA
Base model=Qwen2.5-1.5B
2026.05
21.7
Base
Base model=Qwen2.5-1.5B
2026.05
20.7
MTL
Base model=Qwen2.5-1.5B
2026.05
20.7
PSO-Merging
Base model=Qwen2.5-1.5B
2026.05
19.7
Feedback
Search any
task
Search any
task