Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on MATH 500 (During-task Acc., Post-Switch Acc.)

97.3During-task Accuracy (MATH 500)

Cloud LLM Cluster

43.63657.56871.585.432Jan 29, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
97.397.3
2026.01
87.277.4
2026.01
86.676
2026.01
8172.4
2026.01
79.670.2
2026.01
77.768.1
2026.01
75.263.3
2026.01
7566.2
2026.01
74.864
2026.01
7466.2
2026.01
73.362.1
2026.01
71.158.3
2026.01
70.259.6
2026.01
68.258.2
2026.01
66.458
2026.01
65.649.8
2026.01
65.657.8
2026.01
65.452.9
2026.01
6255.8
2026.01
60.154.9
2026.01
58.543.4
2026.01
55.741.4
2026.01
54.640
2026.01
54.240.8
2026.01
54.240.8
2026.01
5240.9
2026.01
46.638.8
2026.01
46.638.8
2026.01
45.737.1