Share your thoughts, 1 month free Claude Pro on usSee more

Multi-task Language Understanding on MMLU-Pro AceReason (Reduced)

71.1Accuracy

Model-first Greedy

Updated 2mo ago

Evaluation Results

Method	Links
Model-first Greedy 2026.05		71.1
Aya-judge 2026.05		69.2
Input-all 2026.05		68.7
MoA 2026.05		68.7
Oracle-surrogate Greedy 2026.05		68.7
Truth-prediction Greedy 2026.05		67.8
Conditioned-diversity 2026.05		66.9
Top-accuracy 2026.05		66.6
Best-model 2026.05		66.4
GPT5.2-judge 2026.05		56.8