Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-task Language Understanding on MMLU-Pro AceReason (Complete)
Loading...
76.5
Accuracy (MMLU-Pro AceReason)
Oracle-surrogate Greedy
55.7
61.1
66.5
71.9
May 21, 2026
Accuracy (MMLU-Pro AceReason)
Updated 8d ago
Evaluation Results
Method
Method
Links
Accuracy (MMLU-Pro AceReason)
Oracle-surrogate Greedy
k=5, Summarizer=AceReason
2026.05
76.5
Truth-prediction Greedy
k=5, Summarizer=AceReason
2026.05
75.5
Top-accuracy
k=5, Summarizer=AceReason
2026.05
74.6
Model-first Greedy
k=5, Summarizer=AceReason
2026.05
73.8
MoA
k=5, Summarizer=AceReason
2026.05
73.7
Input-all
k=5, Summarizer=AceReason
2026.05
72.4
Best-model
k=5, Summarizer=AceReason
2026.05
72.2
Aya-judge
k=5, Summarizer=AceReason
2026.05
71
Conditioned-diversity
k=5, Summarizer=AceReason
2026.05
68.3
GPT5.2-judge
k=5, Summarizer=AceReason
2026.05
56.5
Feedback
Search any
task
Search any
task