Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Math Problem Solving on 200 IMO-level math problems IMO-AnswerBench, IMO-ProofBench, ArXivMath (test)

50.6Pass@1 Accuracy

Meta-Harness

21.89629.34836.844.252Mar 30, 2026
Updated 18d ago

Evaluation Results

MethodLinks
2026.03
50.6
2026.03
48.9
2026.03
47.6
2026.03
47.2
2026.03
46.9
2026.03
46.7
2026.03
46.6
2026.03
46.3
2026.03
42.6
2026.03
42.3
2026.03
41.8
2026.03
40.4
2026.03
38.8
2026.03
38.1
2026.03
37.5
2026.03
37.1
2026.03
34.9
2026.03
34.4
2026.03
34.1
2026.03
32.8
2026.03
32.2
2026.03
31.7
2026.03
31.3
2026.03
31.1
2026.03
31
2026.03
30.4
2026.03
30.2
2026.03
29.2
2026.03
28.8
2026.03
28.6
2026.03
28.3
2026.03
27.1
2026.03
24.5
2026.03
24.5
2026.03
23.1
2026.03
23