Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on AIME25 (Accuracy, C_mem, C_time)

69.2Accuracy

StepFlow

1.75619.265536.77554.2845Jan 29, 2026Feb 9, 2026Feb 21, 2026Mar 4, 2026Mar 16, 2026Mar 27, 2026Apr 8, 2026
Updated 9d ago

Evaluation Results

MethodLinks
2026.04
69.2--
2026.01
68.90.753.57
2026.01
68.6711
2026.01
68.280.260.5
2026.01
68.120.412.19
2026.01
67.790.372.75
2026.04
66.7--
2026.01
65.30.322
2026.04
62--
2026.04
57.7--
2026.04
54.9--
2026.01
54.150.170.56
2026.01
5411
2026.01
53.860.282.25
2026.01
53.810.262.41
2026.01
52.970.243.76
2026.01
50.230.222.78
2026.04
50.2--
2026.04
48--
2026.04
46.5--
2026.04
43.8--
2026.04
40--
2026.04
39.5--
2026.04
39.2--
2026.04
39.1--
2026.01
10.320.350.68
2026.01
10.250.781.88
2026.01
10.0611
2026.01
10.020.331.45
2026.01
9.940.662.83
2026.01
8.890.280.97
2026.01
6.5711
2026.01
6.380.170.52
2026.01
6.080.413.5
2026.01
5.990.212.44
2026.01
4.870.725.22
2026.01
4.350.140.98