Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on AIME

90Pass@1

NSED (High-Perf open-weight)

-2.24821.70145.6569.599Oct 18, 2024Jan 4, 2025Mar 23, 2025Jun 9, 2025Aug 26, 2025Nov 12, 2025Jan 29, 2026
Updated 12d ago

Evaluation Results

MethodLinks
2026.01
90
2026.01
84.2
2026.01
84
2026.01
78.3
2026.01
71.42
2026.01
54.42
2026.01
54
2026.01
36.67
2026.01
36.25
2026.01
36.25
2026.01
18.5
2026.01
16.4
2024.10
13.33
2024.10
13.33
2024.10
13.33
2025.09
12.7
2026.01
12
2025.09
11.8
2025.09
11.6
2025.09
11
2025.09
10.9
2025.09
10.8
2025.09
10.6
2025.09
10.2
2025.09
10.1
2024.10
10
2025.09
9.6
2025.09
8.9
2025.09
8.5
2025.09
8.5
2025.09
7.9
2025.09
6.7
2024.10
6.67
2024.10
6.67
2025.09
6.6
2024.10
3.33
2025.09
2.7
2025.09
2.3
2025.09
2.2
2025.09
2.1
2025.09
2.1
2025.09
2
2025.09
1.8
2025.09
1.3