Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Math Reasoning on AIME 2025 (Full Pass@k Evaluation)

11.7Pass@1

THR

-0.3642.7685.99.032Oct 4, 2025Nov 2, 2025Dec 2, 2025Jan 1, 2026Jan 31, 2026Mar 2, 2026Apr 1, 2026
Updated 16d ago

Evaluation Results

MethodLinks
2025.10
11.717.924.932.138.944.750.757.966.7-
2025.10
10.516.423.230.237.443.949.755.663.3-
2025.10
9.615.221.929.236.242.649.858.363.3-
2025.10
9.315.222.429.936.542.448.555.963.3-
2026.04
7.4--------29.2
2026.04
7.4--------34.4
2026.04
7.1--------29.1
2025.10
610.115.320.926.833.941.75060-
2025.10
5.99.91520.526.533.641.549.856.7-
2025.10
5.49.214.119.42531.739.54856.7-
2025.10
4.6812.818.725.633.74352.560-
2026.04
3.1--------26.9
2025.10
2.758.914.721.729.537.444.550-
2025.10
1.32.64.98.613.919.926.233.440-
2025.10
0.20.40.61.22.54.89.217.130-
2025.10
0.20.30.61.22.54.89.217.130-
2025.10
0.20.30.61.12.34.6917.533.3-
2025.10
0.10.20.30.61.22.551020-
2025.10
0.10.30.50.91.93.77.314.226.7-