Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on Minerva (Accuracy %)

53.3Accuracy (%)

WIST

5.56417.95730.3542.743Jun 30, 2025Aug 16, 2025Oct 2, 2025Nov 18, 2025Jan 4, 2026Feb 20, 2026Apr 9, 2026
Updated 8d ago

Evaluation Results

MethodLinks
2026.03
53.3
2026.03
51.5
2025.06
51.1
2026.03
48.5
2025.06
48.2
2025.06
48.2
2026.03
47.8
2026.03
47.8
2025.06
46.3
2026.03
44.1
2026.04
43.4
2026.03
43
2025.06
42.6
2025.06
42.4
2025.06
40.8
2026.04
40.8
2025.06
40.1
2026.04
39.3
2026.04
39
2025.06
38.2
2026.04
37.9
2025.06
36.8
2026.03
36.4
2026.03
35.7
2026.03
35.29
2026.03
35.2
2026.04
34.6
2026.03
34.4
2026.03
34.19
2026.03
34.19
2026.03
33.46
2026.03
33.46
2026.03
33.46
2026.04
33.1
2026.03
32.8
2026.03
32.72
2026.04
32.7
2026.03
32.5
2026.03
32.35
2025.06
32
2026.03
31.99
2026.03
31.5
2026.03
31.25
2026.03
30.2
2025.06
29.4
2026.03
26.2
2025.06
25.7
2026.04
25
2025.06
24.6
2025.06
23.8
2025.06
22.8
2026.03
22.7
2026.03
22.4
2026.03
21.8
2025.06
21.7
2026.03
21.32
2026.03
20.59
2026.03
20.59
2026.03
20.59
2026.03
20.22
2026.03
20.22
2026.03
20.22
2026.03
19.49
2026.03
19.12
2026.03
19.12
2026.03
18.5
2026.04
7.4