Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Visual Math on MathVista
Loading...
76.2
Accuracy
DPE
36.576
46.863
57.15
67.437
Feb 26, 2026
Accuracy
Pass@1
Pass@8
Delta Pass@1
Average Score
Updated 26d ago
Evaluation Results
Method
Method
Links
Accuracy
Pass@1
Pass@8
Delta Pass@1
Average Score
DPE
Backbone=Qwen3-VL-8B-I...
2026.02
76.2
-
-
-
64.39
Qwen2.5-VL-72B
Parameters=72B
2026.02
74.8
-
-
-
61.9
Claude4-Sonnet
2026.02
72.4
-
-
-
64.1
DeepEyes
2026.02
70.1
-
-
-
-
GPT-4o
2026.02
63.8
-
-
-
56.1
GPT5-Mini
2026.02
59.6
-
-
-
53.8
DeepEyesV2
2026.02
38.1
-
-
-
-
Base Model
evaluation_mode=Out-of...
2026.02
-
45
78
-
-
GRPO
evaluation_mode=Out-of...
2026.02
-
45
78
-
-
MIG
evaluation_mode=Out-of...
2026.02
-
45
71
0
-
Feedback
Search any
task
Search any
task