Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Reasoning on MATH-500, AIME 24, AIME 25, GPQA Diamond, CommonsenseQA, LiveCodeBench, and LongBenchv2 Qwen3

74.8Accuracy

Base Model

47.2454.39561.5568.705Jan 6, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
74.8-69.2
2026.01
72.2-72.3
2026.01
72.1-72.5
2026.01
69.9-71.6
2026.01
69-75.5
2026.01
68.7-76.6
2026.01
50.9-65.4
2026.01
50.7-59.7
2026.01
50.4513,31859.6
2026.01
50.3-62.1
2026.01
49.1-62.1
2026.01
48.3-62.1