Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on MATH500 (Accuracy, Tokens)

94.6Accuracy

S1

20.65639.85359.0578.247Jul 21, 2025Sep 9, 2025Oct 30, 2025Dec 20, 2025Feb 8, 2026Mar 31, 2026May 21, 2026
Updated 9d ago

Evaluation Results

MethodLinks
2026.05
94.64,549
2026.05
93.84,155
2026.05
93.83,867
2026.05
93.24,261
2026.05
92.84,209
2026.05
911,851
2026.04
892,663.1
2026.05
88.62,391
2026.05
88.22,008
2026.05
87.81,978
2026.05
87.41,281
2026.04
86.82,459.2
2026.05
86.72,059
86.62,042
2026.05
86.51,790
2026.03
86.42,365
2026.04
86.42,429.8
2026.03
86.32,168
2026.04
86.23,183.6
2026.04
86.2-
2026.05
86.21,406
85.83,280
2026.05
85.81,459
85.51,707
2026.05
85.51,423
2026.04
85.42,272.6
2026.04
85.4-
2026.04
85.2-
2026.04
85-
2026.03
84.91,635
2026.04
84.6-
2026.03
84.42,370
2026.04
84.4-
2026.04
83.81,958.2
2026.04
83.4-
2026.04
83.4-
2026.05
83.21,867
2026.04
831,008.8
2026.04
83-
2026.05
82.51,988
2026.04
82-
2026.04
81.81,337.3
2026.04
81.61,692.4
2026.04
81.61,273.7
2026.04
81.6-
2026.05
81.41,932
2026.05
81.42,038
2026.03
81.32,944
2026.05
80.42,205
2026.04
801,837
2026.04
77.2680.2
2026.04
74695
2025.10
73.52,020
2026.04
69.44,374.3
2025.10
69.21,775
2025.10
68.31,003
2025.10
68973
2025.10
67.6954
2025.10
64.4851
2025.10
63.31,460
2025.10
55.61,039
2026.04
49.6598.4
2025.07
49.2313.2
2025.07
48370.4
2025.07
45.8435.7
2025.07
44.8555.7
2025.07
44.8495.1
2025.07
43594.8
2025.07
40.4608.5
2025.07
39590.7
2025.07
38.6353.4
2025.07
37.6448.8
2025.07
37.2376.3
2025.07
37555.2
2025.07
34.5441.2
2025.07
33.2609.9
2025.07
31.4309.2
2025.07
31610
2025.07
31329
2025.07
30.4411.1
2025.07
30376.1
2025.07
29.6507.3
2025.07
28.6375.7
2025.07
27.4440.8
2025.07
25.4472
2025.07
23.5470.5