Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME Aya (Reduced)
Loading...
63.2
Accuracy
Model-first Greedy
32
40.1
48.2
56.3
May 21, 2026
Accuracy
Updated 8d ago
Evaluation Results
Method
Method
Links
Accuracy
Model-first Greedy
k=5, Summarizer=Aya
2026.05
63.2
Input-all
k=5, Summarizer=Aya
2026.05
62.1
MoA
k=5, Summarizer=Aya
2026.05
56.2
Truth-prediction Greedy
k=5, Summarizer=Aya
2026.05
52.2
Oracle-surrogate Greedy
k=5, Summarizer=Aya
2026.05
49.7
Conditioned-diversity
k=5, Summarizer=Aya
2026.05
46.8
GPT5.2-judge
k=5, Summarizer=Aya
2026.05
45
Aya-judge
k=5, Summarizer=Aya
2026.05
41.5
Top-accuracy
k=5, Summarizer=Aya
2026.05
36.4
Best-model
k=5, Summarizer=Aya
2026.05
33.2
Feedback
Search any
task
Search any
task