Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mathematical Reasoning on AIME 2025 (Accuracy and Token Costs)

40Sample Count

SC

17.01622.98328.9534.917Feb 10, 2026
Updated 3d ago

Evaluation Results

MethodLinks
2026.02
40-35.66.67
2026.02
40-37.316.67
2026.02
40-164.116.67
2026.02
39.83-37.86.67
2026.02
37.94.935.16.67
2026.02
37.64-37.416.67
2026.02
36.73-42.46.67
2026.02
33.865.232.716.67
2026.02
32.72-136.316.67
2026.02
32.58-39.316.67
2026.02
30.13-29.410
2026.02
28.9617.7119.816.67
2026.02
27.12-119.116.67
2026.02
26.2-27.613.33
2026.02
17.9-74.516.67