Share your thoughts, 1 month free Claude Pro on usSee more

Mathematical Reasoning on AIME Aya (Reduced)

63.2Accuracy

Model-first Greedy

Updated 2mo ago

Evaluation Results

Method	Links
Model-first Greedy 2026.05		63.2
Input-all 2026.05		62.1
MoA 2026.05		56.2
Truth-prediction Greedy 2026.05		52.2
Oracle-surrogate Greedy 2026.05		49.7
Conditioned-diversity 2026.05		46.8
GPT5.2-judge 2026.05		45
Aya-judge 2026.05		41.5
Top-accuracy 2026.05		36.4
Best-model 2026.05		33.2