Share your thoughts, 1 month free Claude Pro on usSee more

Hardened Language Understanding on MMLU-Pro (val)

38.6Accuracy

Single Best

Updated 1mo ago

Evaluation Results

Method	Links
Single Best 2026.05		38.6
CMA 2026.05		35.7
Model Swarm 2026.05		34.3
TIES 2026.05		32.9
PSO-Merging 2026.05		31.4
EvoGM 2026.05		31.4
Model Soup 2026.05		30
DARE 2026.05		30
MTL 2026.05		28.6
Task Arithmetic 2026.05		28.6
Base 2026.05		27.1