Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on AIME 24, 25 (Acc, Tok, CR, Cost)

76Accuracy

Vanilla

21.29635.49849.763.902Apr 6, 2026
Updated 11d ago

Evaluation Results

MethodLinks
2026.04
7615,13710015,137
2026.04
7614,5269615,096
2026.04
71.311,08673.211,295
2026.04
70.210,72070.810,921
2026.04
69.116,32610016,326
2026.04
68.716,32510016,960
2026.04
68.113,11580.313,400
2026.04
67.712,58877.112,800
2026.04
66.112,66683.713,130
2026.04
61.713,99585.714,489
2026.04
56.714,13086.527,343
2026.04
5511,13010011,130
2026.04
54.910,72996.411,169
2026.04
53.710,51494.511,018
2026.04
52.79,63186.59,847
2026.04
52.49,28283.49,493
2026.04
529,39484.418,991
2026.04
37.614,46110014,461
2026.04
37.314,478100.115,003
2026.04
3511,50879.612,616
2026.04
34.710,64973.611,608
2026.04
34.312,62487.313,472
2026.04
31.94,341395,939
2026.04
28.95,95836.59,073
2026.04
23.46,32843.88,934