Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multimodal mathematical reasoning on MathVision (test)

60.3Accuracy

OpenAI-o1

14.95626.72838.550.272Jun 8, 2025Jul 26, 2025Sep 13, 2025Nov 1, 2025Dec 20, 2025Feb 7, 2026Mar 28, 2026
Updated 18d ago

Evaluation Results

MethodLinks
2025.06
60.3--
2025.06
41.3--
2026.01
39.3--
2026.01
37.9--
2025.06
37.9--
2026.03
32.89-54.14
2026.03
32.57-52.94
2026.03
31.91-51.85
2026.01
31.2195.6-
2026.03
31.09-55.54
2026.01
30.6204.8-
2026.01
30.4--
2025.06
30.4--
2026.01
30.2324.6-
2026.01
29.9692.8-
2026.01
29.6457.2-
2026.03
29.28-51
2025.06
28.6--
2026.03
28.43-53.06
2026.03
28.42-50.29
2026.03
28.29-50.18
2026.03
28.21-50.79
2026.03
27.63-51.8
2025.06
27.6--
2026.03
27.3-51.4
2026.01
27.1298.6-
2026.01
26.8323.5-
2025.06
26.7--
2025.06
26.6--
2026.03
26.12-49.37
2026.01
25.6443-
2026.01
25.2447.8-
2025.06
25.1--
2026.03
24.76-47.82
2025.06
24.3--
2026.01
24--
2026.01
23.4349.2-
2026.03
23.36-47.68
2026.03
23.03-51.98
2026.03
22.16-51.17
2025.06
21.9--
2026.03
21.38-50.53
2026.01
21.2450.6-
2026.01
20.1240.1-
2025.06
19.7--
2026.01
18.8443-
2025.06
16.7--