Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mathematical Reasoning on AIME 2024 (Accuracy and Token Costs)

40Sample Count

SC

17.95223.67629.435.124Feb 10, 2026
Updated 3d ago

Evaluation Results

MethodLinks
2026.02
40-42.30.112
2026.02
40-42.20.1667
2026.02
40-163.80.1333
2026.02
38.81-430.112
2026.02
37.854.941.30.112
2026.02
37.11-41.50.1667
2026.02
34.68-44.10.112
2026.02
34.11537.80.1667
2026.02
33.7-139.30.1333
2026.02
32.3-41.20.1667
2026.02
29.7817122.50.1333
2026.02
28.17-33.70.112
2026.02
27.3-116.70.1333
2026.02
24.7-290.1667
2026.02
18.8-78.10.1