Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-task Language Understanding on MMLU-Pro AceReason (Reduced)
Loading...
71.1
Accuracy
Model-first Greedy
56.228
60.089
63.95
67.811
May 21, 2026
Accuracy
Updated 8d ago
Evaluation Results
Method
Method
Links
Accuracy
Model-first Greedy
k=5, Summarizer=AceReason
2026.05
71.1
Aya-judge
k=5, Summarizer=AceReason
2026.05
69.2
Input-all
k=5, Summarizer=AceReason
2026.05
68.7
MoA
k=5, Summarizer=AceReason
2026.05
68.7
Oracle-surrogate Greedy
k=5, Summarizer=AceReason
2026.05
68.7
Truth-prediction Greedy
k=5, Summarizer=AceReason
2026.05
67.8
Conditioned-diversity
k=5, Summarizer=AceReason
2026.05
66.9
Top-accuracy
k=5, Summarizer=AceReason
2026.05
66.6
Best-model
k=5, Summarizer=AceReason
2026.05
66.4
GPT5.2-judge
k=5, Summarizer=AceReason
2026.05
56.8
Feedback
Search any
task
Search any
task