Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Step-wise Verification on MathVerse VO
Loading...
62.8
Macro F1
Gemini-2.0-Flash
41.376
46.938
52.5
58.062
Nov 28, 2025
Macro F1
Updated 3d ago
Evaluation Results
Method
Method
Links
Macro F1
Gemini-2.0-Flash
Model Type=Proprietary
2025.11
62.8
TIM-PRM
Model Type=Open-source...
2025.11
61.9
TIM-PRM
Model Type=Open-source...
2025.11
61.7
Qwen3-VL
Model Type=Open-source...
2025.11
59.9
GPT-4o
Model Type=Proprietary
2025.11
59.7
Qwen2.5-VL
Model Type=Open-source...
2025.11
59.7
MiniCPM-V2.6
Model Type=Open-source...
2025.11
58.9
GPT-4o-Mini
Model Type=Proprietary
2025.11
57.1
MM-PRM
Model Type=Open-source...
2025.11
54.9
InternVL2.5
Model Type=Open-source...
2025.11
53.7
LLaVA-OV
Model Type=Open-source...
2025.11
53
VisualPRM
Model Type=Open-source...
2025.11
53
InternVL2.5
Model Type=Open-source...
2025.11
50.9
Qwen3-VL
Model Type=Open-source...
2025.11
49.6
InternVL2.5
Model Type=Open-source...
2025.11
49.2
Qwen2.5-VL
Model Type=Open-source...
2025.11
47.8
InternVL2.5
Model Type=Open-source...
2025.11
47.8
LLaVA-OV
Model Type=Open-source...
2025.11
42.2
Feedback
Search any
task
Search any
task