Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Math Reasoning on MATH500 (Accuracy)

95.6Accuracy

CE-GPPO

-3.40822.2964873.704May 16, 2025Jul 3, 2025Aug 20, 2025Oct 8, 2025Nov 25, 2025Jan 12, 2026Mar 2, 2026
Updated 13d ago

Evaluation Results

MethodLinks
2025.09
95.6
2025.09
95.1
2025.09
94.9
2025.09
93.7
2025.09
93.6
2025.11
93.2
2025.11
91.6
2025.11
91.4
2025.11
91.4
2025.11
91
2025.11
91
2025.09
91
2025.09
90.9
2025.11
90
2025.09
90
2025.11
89.5
2026.03
89.4
2026.03
89.2
2026.03
89.2
2025.11
89
2026.03
88.4
2025.09
88.3
2025.11
88
2025.11
87.6
2025.11
87.6
2025.11
86.4
2025.11
86
2025.09
86
2025.11
85.8
2025.11
85.8
2025.11
85.6
2025.11
85.2
2025.11
85.2
2025.11
85
2025.11
85
2025.11
84.4
2025.11
84.2
2025.11
84
2025.11
83.8
2025.11
83
2025.11
82
2025.11
81.5
2025.11
81.4
2025.11
80
2025.11
80
2025.11
80
2025.05
79.6
2025.05
79.4
2025.05
79.2
2025.11
79
2025.11
78.8
2025.11
78.5
2025.11
78
2025.11
78
2025.11
77.6
2025.11
77.5
2025.11
76.8
2025.11
72.8
2025.11
72.6
2025.05
71.6
2025.11
71.4
2026.03
71
2026.03
69.6
2026.03
69.6
2026.03
69
2026.03
69
2026.03
68.4
2025.11
68.2
2025.11
67.8
2025.11
64.8
2025.11
63.2
55.4
2025.11
51.8
2025.11
46.2
2025.11
39.9
2025.11
39.6
2025.11
37.8
2025.11
37.4
2026.02
29.6
2026.02
26.4
2026.02
25.6
2026.02
22.4
2026.02
0.4