Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on MATH500 1.0 (test)

96.9Accuracy

A3PO

32.52449.23765.9582.663Sep 26, 2025Oct 21, 2025Nov 15, 2025Dec 10, 2025Jan 4, 2026Jan 29, 2026Feb 23, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2025.12
96.9
2025.12
96.2
2025.12
95.8
2025.12
95.7
2025.12
95.5
2025.12
95.2
2025.12
91.3
2025.12
90.4
2025.12
87.4
2025.12
87.3
2025.12
87.1
2025.09
87.05
2025.12
86.9
2025.12
86.8
2025.12
86.2
2025.12
85.4
2025.12
84.5
2025.12
83.8
2025.09
83.47
2025.12
82.3
2025.09
79.95
2025.09
78.68
2026.02
51.9
2026.02
51.6
2026.02
50.8
2026.02
50.4
2026.02
49.6
2026.02
48.4
2026.02
47.6
2026.02
47.2
2026.02
46.6
2026.02
44.3
2026.02
42.7
2026.02
42.6
2026.02
42.5
2026.02
41.7
2026.02
37.2
2026.02
35