Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on AIME 25 (Acc@8, Pass@8)

86.7Accuracy

Hermes@1

4.33225.71647.168.484Nov 24, 2025Dec 24, 2025Jan 24, 2026Feb 24, 2026Mar 26, 2026Apr 26, 2026May 27, 2026
Updated 2d ago

Evaluation Results

MethodLinks
2025.11
86.7---
2025.11
86.7---
2025.11
86.7---
2025.11
83.3---
2025.11
83.3---
2025.11
83.3---
2025.11
80---
2025.11
80---
2025.11
76.7---
2025.11
76.7---
2026.05
76.5---
2026.05
73.1---
2026.05
71.8---
2026.05
71.4---
2026.05
71.4---
2026.05
71---
2025.11
70---
2025.11
70---
2025.11
70---
2026.05
69.9---
2026.05
69.1---
2026.05
67.9---
2026.05
67.6---
2026.05
66.2---
2026.05
65.7---
2025.11
63.3---
2025.11
60---
2025.11
60---
2025.11
60---
2025.11
60---
2025.11
56.7---
2025.11
53.3---
2025.11
53.3---
2026.05
52.5-73.3-
2025.11
50---
2026.05
46.6---
2025.11
43.3---
2026.05
42.4---
2026.05
36.2-66.7-
2026.05
33.33---
2026.05
33.3---
2026.05
33.3---
2025.11
33.3---
2026.05
32.8-53.3-
2026.05
30---
2025.11
30---
2025.11
30---
2025.11
30---
2026.05
29.2---
2026.05
26.7-56.7-
2026.05
26.7---
2026.05
26.7---
2026.05
26.7---
2026.05
26.7---
2026.05
26.7---
2026.05
26.67---
2026.05
24.8---
2025.11
23.3---
2025.11
23.3---
2025.11
23.3---
2026.05
20.9---
2026.05
20.83---
2026.05
20---
2026.05
20---
2026.05
20---
2025.11
20---
2025.11
20---
2026.05
18.8-43.3-
2026.05
18.3---
2026.05
17.7---
2026.05
17.2---
2026.05
16.7---
2026.05
16.7---
2026.05
16.7---
2026.05
16.7---
2025.11
16.7---
2026.05
16.1-36.7-
2026.05
15.7---
2026.05
15.7---
2026.05
15.4---
2026.05
15.1---
2026.05
15---
2026.05
14.7---
2026.05
14.6---
2026.05
14.3---
2026.05
13.9---
2026.05
13.8---
2026.05
13.3---
2026.05
13.3---
2026.05
13.3---
2026.05
12.5---
2026.05
12.4---
2026.05
11.7---
2026.05
11.1---
2026.05
10---
2026.05
9.2---
2026.05
9.1---
2026.05
8.8---
2026.05
8---
2026.05
7.5-20-
Showing 100 of 236 rows