Share your thoughts, 1 month free Claude Pro on usSee more

Multi-task Language Understanding on MMLU-Pro AceReason (Complete)

76.5Accuracy (MMLU-Pro AceReason)

Oracle-surrogate Greedy

Updated 2mo ago

Evaluation Results

Method	Links
Oracle-surrogate Greedy 2026.05		76.5
Truth-prediction Greedy 2026.05		75.5
Top-accuracy 2026.05		74.6
Model-first Greedy 2026.05		73.8
MoA 2026.05		73.7
Input-all 2026.05		72.4
Best-model 2026.05		72.2
Aya-judge 2026.05		71
Conditioned-diversity 2026.05		68.3
GPT5.2-judge 2026.05		56.5