Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mathematical Reasoning on MATH 500 (During-task Acc., Post-Switch Acc.)

97.3During-task Accuracy (MATH 500)

Cloud LLM Cluster

43.63657.56871.585.432Jan 29, 2026
Updated 4d ago

Evaluation Results

MethodLinks
97.397.3
2026.01
87.277.4
2026.01
86.676
2026.01
8172.4
2026.01
79.670.2
2026.01
77.768.1
2026.01
75.263.3
2026.01
7566.2
2026.01
74.864
2026.01
7466.2
2026.01
73.362.1
2026.01
71.158.3
2026.01
70.259.6
2026.01
68.258.2
2026.01
66.458
2026.01
65.649.8
2026.01
65.657.8
2026.01
65.452.9
2026.01
6255.8
2026.01
60.154.9
2026.01
58.543.4
2026.01
55.741.4
2026.01
54.640
2026.01
54.240.8
2026.01
54.240.8
2026.01
5240.9
2026.01
46.638.8
2026.01
46.638.8
2026.01
45.737.1