Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mathematical Reasoning on Math Benchmarks Aggregate (Average Accuracy and Length)

81.9Accuracy (Avg)

Standardp

50.200858.430466.6674.8896Feb 2, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
81.96,564.23-
2026.02
81.285,487.140.09
2026.02
81.174,784.840.18
2026.02
80.942,880.330.44
2026.02
80.786,246.5-0.09
2026.02
80.653,930.10.25
2026.02
79.965,497.31-0.07
2026.02
76.455,251.44-0.47
2026.02
75.332,745.73-0.22
2026.02
64.031,760.43-1.45
2026.02
63.831,986.46-1.51
2026.02
55.52620.850.19
2026.02
55.43522.290.33
2026.02
54.99534.080.27
2026.02
54.28673.14-
2026.02
53.21595.6-0.08
2026.02
52.9430.660.11
2026.02
51.42721.71-0.6