Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Step-wise Verification on MMMU, MathVision, MathVerse-VO, DynaMath, WeMath Overall
Loading...
62.3
Macro F1
Gemini-2.0-Flash
43.684
48.517
53.35
58.183
Nov 28, 2025
Macro F1
Updated 2d ago
Evaluation Results
Method
Method
Links
Macro F1
Gemini-2.0-Flash
Model Type=Proprietary
2025.11
62.3
TIM-PRM
Model Type=Open-source...
2025.11
61.7
Qwen3-VL
Model Type=Open-source...
2025.11
61.1
Qwen2.5-VL
Model Type=Open-source...
2025.11
60.5
GPT-4o
Model Type=Proprietary
2025.11
60.3
TIM-PRM
Model Type=Open-source...
2025.11
60.3
GPT-4o-Mini
Model Type=Proprietary
2025.11
57.9
VisualPRM
Model Type=Open-source...
2025.11
55.9
MM-PRM
Model Type=Open-source...
2025.11
55.5
InternVL2.5
Model Type=Open-source...
2025.11
52.6
LLaVA-OV
Model Type=Open-source...
2025.11
52.3
Qwen3-VL
Model Type=Open-source...
2025.11
51.1
Qwen2.5-VL
Model Type=Open-source...
2025.11
51
InternVL2.5
Model Type=Open-source...
2025.11
50.8
MiniCPM-V2.6
Model Type=Open-source...
2025.11
50.4
InternVL2.5
Model Type=Open-source...
2025.11
49.2
InternVL2.5
Model Type=Open-source...
2025.11
48
LLaVA-OV
Model Type=Open-source...
2025.11
44.4
Feedback
Search any
task
Search any
task