Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multimodal Mathematical Reasoning on MathVerse (test)

64.9Accuracy (ALL)

Human

8.11622.85837.652.342Feb 19, 2025Apr 13, 2025Jun 6, 2025Jul 30, 2025Sep 21, 2025Nov 14, 2025Jan 7, 2026
Updated 18d ago

Evaluation Results

MethodLinks
2025.02
64.971.270.961.468.3---
2026.01
52.4-------
2025.02
50.859.850.34846.5---
2026.01
50.6----201--
2026.01
49.9-------
2026.01
49.8----283.9--
2026.01
48.7----188.4--
2026.01
47.9----398.4--
2026.01
47.3-------
2026.01
46.9----388.9--
2026.01
46.4----631.5--
2026.01
46.3-------
2026.01
44.2----265.4--
2026.01
43----286.3--
2026.01
41.8----423.9--
2026.01
39.5----364.3--
2026.01
36.7-------
2026.01
36.3----121.6--
2026.01
34.6----362.3--
2026.01
31.9----388.9--
2025.02
25.730.226.325.323.2---
2025.02
22.927.324.924.521.7---
2025.02
21.32621.218.519.1---
2025.02
20.725.521.820.921.2---
2025.02
20.123.716.31919.8---
2025.02
19.32323.220.218.4---
2025.02
17.819.217.915.615.5---
2025.02
16.620.920.717.214.6---
2025.02
16.120.814.135.228.9---
2025.02
12.717.11212.612.7---
2025.02
12.412.412.412.412.4---
2025.02
12.212.312.912.514.8---
2025.02
10.311.611.411.19.4---
2026.03
------45.5647.68
2026.03
------44.3247.82
2026.03
------46.8849.37
2026.03
------47.2150.29
2026.03
------45.9450.18
2026.03
------47.5651
2026.03
------47.7550.79
2026.03
------48.251.85
2026.03
------49.8751.4
2026.03
------50.152.94
2026.03
------40.8650.53
2026.03
------41.1251.17
2026.03
------42.151.98
2026.03
------40.4851.8
2026.03
------41.3653.06
2026.03
------41.6254.14
2026.03
------46.8255.54