Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Step-wise Verification on MMMU
Loading...
59.2
Macro F1
Qwen2.5-VL
44.328
48.189
52.05
55.911
Nov 28, 2025
Macro F1
Updated 2d ago
Evaluation Results
Method
Method
Links
Macro F1
Qwen2.5-VL
Model Type=Open-source...
2025.11
59.2
Gemini-2.0-Flash
Model Type=Proprietary
2025.11
58.5
TIM-PRM
Model Type=Open-source...
2025.11
58.3
Qwen3-VL
Model Type=Open-source...
2025.11
56.6
GPT-4o
Model Type=Proprietary
2025.11
56.3
TIM-PRM
Model Type=Open-source...
2025.11
55.8
VisualPRM
Model Type=Open-source...
2025.11
54.9
GPT-4o-Mini
Model Type=Proprietary
2025.11
53.6
Qwen2.5-VL
Model Type=Open-source...
2025.11
53.1
InternVL2.5
Model Type=Open-source...
2025.11
52
InternVL2.5
Model Type=Open-source...
2025.11
51.5
MM-PRM
Model Type=Open-source...
2025.11
51.2
Qwen3-VL
Model Type=Open-source...
2025.11
50.4
InternVL2.5
Model Type=Open-source...
2025.11
48.8
InternVL2.5
Model Type=Open-source...
2025.11
47.1
LLaVA-OV
Model Type=Open-source...
2025.11
46.1
LLaVA-OV
Model Type=Open-source...
2025.11
45.7
MiniCPM-V2.6
Model Type=Open-source...
2025.11
44.9
Feedback
Search any
task
Search any
task