Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on AIME 2025 (Accuracy, Mean, Drop)

80Accuracy

BF16

-3.218.44061.6May 18, 2026
Updated 14d ago

Evaluation Results

MethodLinks
2026.05
8077.89-
2026.05
8078.150.26
2026.05
78.8977.950.06
2026.05
78.8975.14-2.75
2026.05
78.8978.160.27
2026.05
74.6775.64-
2026.05
7474.17-0.02
2026.05
72.6774.430.24
2026.05
7073.11-2.53
2026.05
7070.84-
2026.05
68.6774.19-
2026.05
6869.97-0.87
2026.05
66.6769.416-1.42
2026.05
66.6771.99-2.2
2026.05
6471.864-3.78
2026.05
46.6756.88-13.96
2026.05
37.7860.49-17.4
2026.05
16.6731.74-43.9
2026.05
2.2210.14-60.7
2026.05
01.4-74.24
2026.05
00-75.64
2026.05
00-70.84
2026.05
07.9-66.29
2026.05
00-74.19