Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on DEEPMATH 128 samples

35.93Top-1 Accuracy

ePF

9.118816.079423.0430.0006Oct 7, 2025
Updated 18d ago

Evaluation Results

MethodLinks
2025.10
35.93
2025.10
34.37
2025.10
32.03
2025.10
32.03
2025.10
30.46
2025.10
25
2025.10
23.43
2025.10
22.65
2025.10
21.09
2025.10
20.31
2025.10
13.28
2025.10
10.15