Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mathematical Reasoning on AIME25 (Accuracy, Average response length)

80.3Accuracy

Qwen3-4B-Thinking

48.78856.96965.1573.331Jan 7, 2026
Updated 3d ago

Evaluation Results

MethodLinks
2026.01
80.323,912
2026.01
76.717,338
2026.01
68.911,969
2026.01
67.811,395
2026.01
67.810,157
2026.01
6010,891
2026.01
609,934
2026.01
57.810,099
2026.01
56.712,247
2026.01
54.411,080
2026.01
507,368