Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on AIME 2024 (AIME24, Math Avg)

88.67Accuracy

DeepSeek-R1-0528

-3.442820.471144.38568.2989Feb 7, 2026Feb 18, 2026Mar 1, 2026Mar 12, 2026Mar 23, 2026Apr 3, 2026Apr 14, 2026
Updated 3d ago

Evaluation Results

MethodLinks
2026.02
88.67--
2026.02
88.67--
2026.04
83.3--
2026.04
76.7--
2026.04
73.1--
2026.04
69.6--
2026.04
69.2--
2026.04
68.97--
2026.04
59.2--
2026.04
57.1--
2026.04
50.4--
2026.04
49.6--
2026.04
48.3--
2026.04
44.83--
2026.04
44.83--
2026.04
43.8--
2026.04
43.3--
2026.04
42.5--
2026.04
42.1--
2026.04
41.38--
2026.04
34.6--
2026.04
34.6--
2026.04
34.6--
2026.04
34.48--
2026.04
33.8--
2026.04
31.03--
2026.04
27.59--
2026.04
26.09--
2026.04
25.8--
2026.04
25.4--
2026.04
24.14--
2026.04
20.69--
2026.04
20.69--
2026.04
20.69--
2026.04
17.9--
2026.04
17.24--
2026.04
17.24--
2026.04
17.1--
2026.04
16.7--
2026.04
16.7--
2026.04
16.7--
2026.04
16.4--
2026.04
16.3--
2026.04
15.4--
2026.04
15.4--
2026.04
14.6--
2026.04
14.2--
2026.04
13.8--
2026.04
13.79--
2026.04
13.69--
2026.04
13.3--
2026.04
13.3--
2026.04
13.3--
2026.04
12.29--
2026.04
12.1--
2026.04
10.34--
2026.04
10--
2026.04
10--
2026.04
9.79--
2026.04
9.17--
2026.04
8.96--
2026.04
8.33--
2026.04
8.2--
2026.04
7.92--
2026.04
7.5--
2026.04
6.9--
2026.04
6.88--
2026.04
6.88--
2026.04
6.7--
2026.04
6.46--
2026.04
5.7--
2026.04
5.7--
2026.04
5.6--
2026.04
5--
2026.04
4.79--
2026.04
4.2--
2026.04
3.6--
2026.04
3.1--
2026.04
1.7--
2026.04
1.5--
2026.04
1.2--
2026.04
1.1--
2026.04
1.1--
2026.04
0.9--
2026.04
0.9--
2026.04
0.6--
2026.04
0.4--
2026.04
0.4--
2026.04
0.4--
2026.04
0.4--
2026.04
0.4--
2026.04
0.3--
2026.04
0.3--
2026.04
0.1--
2026.04
0.1--
2026.04
0.1--
2026.04
0.1--
2026.04
0.1--
2026.04
0.1--
2026.04
0.1--
Showing 100 of 118 rows