Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Math Reasoning on MATH lighteval

98.4During-task Accuracy

Cloud LLM Cluster

48.37661.36374.3587.337Jan 29, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
98.498.4-
2026.01
84.575.510.7
2026.01
80.774.87.3
2026.01
78.168.512.3
2026.01
77.469.99.7
2026.01
77.267.712.3
2026.01
73.559.818.6
2026.01
73.161.316.1
2026.01
72.763.113.2
2026.01
72.266.57.9
2026.01
70.363.210.1
2026.01
70.363.110.2
2026.01
68.360.611.3
2026.01
67.859.212.7
2026.01
64.752.518.9
2026.01
63.858.58.3
2026.01
62.255.410.9
2026.01
6254.212.6
2026.01
61.953.413.7
2026.01
5953.79
2026.01
58.749.615.5
2026.01
5648.114.1
2026.01
55.344.419.7
2026.01
55.344.419.7
2026.01
55.244.519.4
2026.01
53.143.717.7
2026.01
51.633.235.7
2026.01
51.633.235.7
2026.01
50.332.834.8