Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Reasoning on GSM8K, MATH500, HumanEval, and MBPP Suite

60.3Average Accuracy

Info-Gain

21.19631.34841.551.652Feb 20, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
60.341
2026.02
60.352.1
2026.02
55.874.1
2026.02
55.398
2026.02
40105.3
2026.02
40180.3
2026.02
36.9117.4
2026.02
35210.3
2026.02
35334.3
2026.02
34.9120.4
2026.02
34.3163.5
2026.02
33138.2
2026.02
32.1204.1
2026.02
29.3334.1
2026.02
28.4238.6
2026.02
27.8171.4
2026.02
27.7208
2026.02
26.7230.9
2026.02
22.8419.5
2026.02
22.7398