Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on MATH (Accuracy and Token Usage)

96.67Accuracy

Full-Graph

63.608472.191780.77589.3583Mar 7, 2024Jul 7, 2024Nov 6, 2024Mar 8, 2025Jul 8, 2025Nov 7, 2025Mar 9, 2026
Updated 26d ago

Evaluation Results

MethodLinks
2026.02
96.67--
2026.02
96.67--
2026.02
95.56--
2026.02
95.56--
2026.02
95.56--
2026.02
94.44--
2026.02
94.44--
2026.02
94.44--
2026.02
94.44--
2026.03
94.3--
2026.03
94--
2026.03
93.9--
2026.03
93.7--
2026.02
93.33--
2026.02
93.33--
2026.03
93.3--
2026.02
92.22--
2026.02
92.22--
2026.02
91.11--
2026.02
91.11--
2026.02
91.11--
2026.03
91.11,605-
2026.03
90.5--
2026.03
90.41,821-
2026.03
90.32,229-
2026.02
90--
2026.03
89.11,324-
2026.03
88.9--
2026.02
88.88--
2026.03
87.82,270-
2026.02
87.77--
2026.03
87.61,700-
2026.03
87.5--
2026.03
87.51,801-
2026.03
87--
2026.03
86.71,598-
2026.03
85.32,774-
2026.03
85.11,914-
2026.03
84.81,464-
2026.03
84.5--
2026.03
84.51,716-
2026.02
84.44--
2025.06
83.9--
2025.06
83.7--
2026.03
83.6--
2025.06
83.5--
2025.06
82.9--
2026.03
81.3--
2026.03
80.8--
2026.02
80.59--
2026.03
80.3--
2026.03
80.1--
2026.03
79.4--
2026.03
79.4--
2026.03
78.8--
2026.03
78.8--
2026.03
78.8--
2026.03
78.7--
2026.03
78.7--
2026.03
78.3--
2026.02
77.78--
2026.02
77.2--
2026.02
77.12--
2026.02
76.58--
2026.02
76.4--
2026.02
76.24--
2026.02
76.18--
2026.02
76.05--
2026.02
75.97--
2026.02
75.64--
2026.02
75.32--
2026.02
74.88--
2026.02
74.44--
2026.02
74.4--
2024.07
73.8--
2026.03
73.7--
2026.02
72.5--
2026.03
72.5914-
2026.03
71.2--
2026.03
71.2--
2024.07
71.1--
2026.03
70.8--
2026.03
68.4--
2026.03
67.6--
2026.03
67.6--
2026.03
67.6--
2025.09
67.4--
2026.03
67.4--
2026.03
67.3851-
2026.03
66.2--
2025.04
66.2--
2024.03
66.1--
2026.03
66.1--
2025.09
65.8--
2025.09
65.8--
2025.09
65.2--
2026.03
65.2--
2026.03
65--
2026.03
65--
2026.01
64.88--
Showing 100 of 370 rows