Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mathematical Reasoning on AIME 2025 (Accuracy, Average output tokens)

36.7Accuracy

SAPO

-1.268.59518.4528.305Aug 5, 2025Sep 5, 2025Oct 7, 2025Nov 8, 2025Dec 9, 2025Jan 10, 2026Feb 11, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
36.7-
2026.02
34.2-
2026.02
33.5-
2026.02
28.8-
2026.02
28.2-
2026.02
27.4-
2026.02
24.6-
2026.02
21.4-
2026.02
0.96125,000
2026.02
0.9515,000
2026.02
0.94530,000
2026.02
0.93116,000
2025.08
0.769219,902.43
2025.08
0.7611,716.63
2026.02
0.7-
2025.08
0.666723,711.6
2025.08
0.655213,434.17
2026.02
0.6-
2025.08
0.586218,000.1
2025.08
0.517210,842.07
2026.02
0.5-
2025.08
0.357118,203.23
2025.08
0.357111,471.17
2026.02
0.2-