Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on AIME (Result and Process Tracking)

26.67Result Accuracy

StaRPO

-1.06686.134113.33520.5359Apr 10, 2026
Updated 6d ago

Evaluation Results

MethodLinks
2026.04
26.6723.33
2026.04
23.3320
2026.04
23.3316.67
2026.04
16.6713.33
2026.04
13.3310
2026.04
13.3313.33
2026.04
1010
2026.04
1010
2026.04
1010
2026.04
6.676.67
2026.04
6.676.67
2026.04
6.676.67
2026.04
3.333.33
2026.04
00