Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on MATH-500 (Accuracy, Token Count, Calibration Rate)

93.5Accuracy

Vanilla

68.85275.25181.6588.049Aug 13, 2025Sep 17, 2025Oct 22, 2025Nov 27, 2025Jan 1, 2026Feb 5, 2026Mar 13, 2026
Updated 2d ago

Evaluation Results

MethodLinks
2026.03
93.56,212100
2026.03
936,385100
2025.08
92.24,223-
2026.03
924,598100
2025.08
91.42,033-
2025.08
91.23,563-
2026.03
91.15,037100
2026.03
90.72,42545.1
2026.03
90.72,26146.8
2026.03
90.22,94643.9
2025.08
90.22,413-
2026.03
90.14,37293.9
2025.08
90.062,778-
2026.03
89.83,77892
2025.08
89.83,555-
2026.03
89.12,86347.8
2025.08
88.82,016-
2026.03
88.12,96756.8
2026.03
87.75,69587.8
2026.03
87.35,86095.9
2026.03
86.33,24055.5
2026.03
84.178617.5
2026.03
83.21,90828.1
2025.08
83.22,397-
2025.08
835,665-
2025.08
82.82,813-
2025.08
82.23,374-
2025.08
81.84,788-
2025.08
81.83,335-
2026.03
80.92,50156.2
2026.03
80.780916.1
2026.03
79.92,60252
2026.03
79.61,70242.4
2026.03
79.153511.7
2026.03
78.31,85041
2026.03
713,79160.3
2026.03
69.84,27974.6