Share your thoughts, 1 month free Claude Pro on usSee more

Hardened Language Understanding on MMLU-Pro (test)

23.4Accuracy (MMLU-Pro Test)

Task Arithmetic

Updated 1mo ago

Evaluation Results

Method	Links
Task Arithmetic 2026.05		23.4
DARE 2026.05		23.3
Single Best 2026.05		23.2
Model Swarm 2026.05		22.9
Model Soup 2026.05		22.5
EvoGM 2026.05		22.5
TIES 2026.05		22.1
CMA 2026.05		21.7
Base 2026.05		20.7
MTL 2026.05		20.7
PSO-Merging 2026.05		19.7