Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on IMO-Bench

57.02Accuracy

F-1

0.423215.116629.8144.5034Jan 27, 2026Feb 15, 2026Mar 6, 2026Mar 25, 2026Apr 13, 2026May 2, 2026May 21, 2026
Updated 12d ago

Evaluation Results

MethodLinks
2026.01
57.02
2026.01
56.26
2026.01
55.58
2026.01
55.44
2026.01
54.71
2026.01
53.91
2026.01
53.35
2026.01
51.82
2026.01
50.99
2026.01
49.06
2026.01
47.73
2026.05
46.7
2026.05
44.3
2026.01
42.61
2026.05
41.9
2026.05
40.5
2026.05
38.8
2026.05
38
2026.05
37.7
2026.05
34.1
2026.05
33.9
2026.05
32.5
2026.01
30.54
2026.01
27.64
2026.01
27.56
2026.01
23.94
2026.01
21.85
2026.01
21.84
2026.01
20.45
2026.01
17.48
2026.05
13.3
2026.05
11.9
2026.05
11.2
2026.05
11
2026.05
11
2026.05
10.4
2026.05
9.9
2026.05
9.9
2026.05
9.9
2026.05
9.5
2026.05
9.4
2026.05
8.8
2026.05
8.7
2026.05
8.3
2026.05
8.3
2026.05
7.8
2026.05
7.4
2026.05
7.1
2026.05
7.1
2026.05
6.8
2026.05
6.1
2026.05
6.1
2026.05
5.9
2026.05
5.8
2026.05
5.4
2026.05
4.5
2026.05
2.6