Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

General Reasoning on GSM8k, MATH 500, GPQA, SuperGPQA

63.8Average Accuracy

BF16

6.28821.21936.1551.081Jan 20, 2026
Updated 3d ago

Evaluation Results

MethodLinks
2026.01
63.8
2026.01
62.7
2026.01
60.9
2026.01
56.9
2026.01
55.9
2026.01
52.6
2026.01
29.6
2026.01
25.2
2026.01
23.2
2026.01
13
2026.01
8.5