Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on MATH (pass@1, maj@8, rm@8)

81.1Pass@1

GPT-4o-2024-08-06

0.08421.11742.1563.183Jun 7, 2024Sep 15, 2024Dec 25, 2024Apr 4, 2025Jul 14, 2025Oct 22, 2025Jan 31, 2026
Updated 3mo ago

Evaluation Results

MethodLinks
2024.09
81.1---
2024.09
79.985.388.9-
2026.01
78.2---
2024.09
75.880.383.9-
2026.01
74---
2026.01
73.8---
2026.01
73.5---
2026.01
73.5---
2026.01
73.3---
2026.01
71.8---
2026.01
66.6---
2026.01
53.4---
2026.01
44.6---
2026.01
28---
2026.01
28---
2026.01
26.8---
2026.01
25.8---
2026.01
22.6---
2024.06
8.52--22.55
2024.06
7.6--21.46
2024.06
7.48--22.46
2024.06
6.2--18.18
2024.06
5.92--18.3
2026.01
3.2---
2026.01
---90
2026.01
---92
2026.01
---90
2026.01
---88
2026.01
---73
2026.01
---90
2026.01
---81
2026.01
---84
2026.01
---92