Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on MATH500 (128-sample random subset)

71.09Top-1 Accuracy

ePF

44.278851.239458.265.1606Oct 7, 2025
Updated 18d ago

Evaluation Results

MethodLinks
2025.10
71.09
2025.10
70.31
2025.10
67.96
2025.10
66.42
2025.10
66.4
2025.10
65.62
2025.10
62.5
2025.10
60.93
2025.10
60.15
2025.10
57.81
2025.10
53.9
2025.10
45.31