Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mathematical Reasoning on MATH (Accuracy and Token Usage)

96.67Accuracy

Full-Graph

25.981244.333162.68581.0369Jan 5, 2024May 14, 2024Sep 21, 2024Jan 29, 2025Jun 8, 2025Oct 16, 2025Feb 23, 2026
Updated 3d ago

Evaluation Results

MethodLinks
2026.02
96.67--
2026.02
96.67--
2026.02
95.56--
2026.02
95.56--
2026.02
95.56--
2026.02
94.44--
2026.02
94.44--
2026.02
94.44--
2026.02
94.44--
2026.02
93.33--
2026.02
93.33--
2026.02
92.22--
2026.02
92.22--
2026.02
91.11--
2026.02
91.11--
2026.02
91.11--
2026.02
90--
2026.02
88.88--
2026.02
87.77--
2026.02
84.44--
2025.06
83.9--
2025.06
83.7--
2025.06
83.5--
2025.06
82.9--
2026.02
77.78--
2026.02
74.44--
2024.07
73.8--
2026.02
72.5--
2024.07
71.1--
2024.03
66.1--
2026.01
64.88--
2026.02
64.7--
2026.01
64.39--
2026.01
63.61--
2026.02
63.1--
2025.12
60.7-70.5
2025.12
60.2-69.1
2025.12
59.7-68.2
2025.12
58.6-68.7
2025.09
55.37-83.26
2026.02
52.9--
2025.09
52.42-80.66
2025.09
52.3-75.86
2025.09
52.25-80.18
2025.09
52-77.29
2025.09
51.82-79.73
2025.09
51.82-80.43
2026.02
51.7--
2026.02
51--
2025.09
51-75.5
2025.12
49.8-63.5
2025.09
48.73-74.13
2026.02
48.4--
2025.09
48.28-73.39
2025.09
48-74.08
2025.09
48-74.04
2025.09
47.91-75.42
2026.02
47.8--
2025.12
47.4-60.3
2025.09
46.53-75.57
2025.09
46.4-73.64
2025.09
46.29-73.89
2025.09
46.1-70.05
2025.12
46-57.9
2025.09
45.89-72.05
2025.12
45.4-57.2
2025.09
45.37-72.01
2025.12
43.8-58.1
2026.02
43.5--
2025.09
43.18-72.23
2024.03
43.1--
2025.07
4198.9-
2024.03
40.8--
2025.12
37.7-55.6
2026.02
37.1--
2026.02
36.9--
2025.12
36.7-57.8
2026.02
36.6--
2025.12
36.2-51.9
2025.12
35.8-65.2
2024.03
35.6--
2024.03
35.2--
2026.02
34.8--
2025.12
34.8-46.1
2026.01
34.8--
2025.06
34.7--
2026.02
34.3--
2024.01
34.1--
2026.01
33--
2025.07
32.50.94-
2026.02
31.6--
2026.02
31.6--
2025.06
31.6--
2026.02
31.3--
2026.02
31.2--
2024.03
30.26--
2025.12
30-43.3
2025.06
30--
2025.07
29.533.3-
2026.02
28.7--
Showing 100 of 194 rows