Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Step-wise Verification on DynaMath
Loading...
66.7
Macro F1
Gemini-2.0-Flash
43.82
49.76
55.7
61.64
Nov 28, 2025
Macro F1
Updated 2d ago
Evaluation Results
Method
Method
Links
Macro F1
Gemini-2.0-Flash
Model Type=Proprietary
2025.11
66.7
TIM-PRM
Model Type=Open-source...
2025.11
65.9
TIM-PRM
Model Type=Open-source...
2025.11
64.3
Qwen2.5-VL
Model Type=Open-source...
2025.11
62.9
Qwen3-VL
Model Type=Open-source...
2025.11
61.4
GPT-4o
Model Type=Proprietary
2025.11
59
MM-PRM
Model Type=Open-source...
2025.11
58.1
VisualPRM
Model Type=Open-source...
2025.11
57.5
LLaVA-OV
Model Type=Open-source...
2025.11
57
GPT-4o-Mini
Model Type=Proprietary
2025.11
56.7
InternVL2.5
Model Type=Open-source...
2025.11
51.8
Qwen2.5-VL
Model Type=Open-source...
2025.11
51.3
InternVL2.5
Model Type=Open-source...
2025.11
50.8
InternVL2.5
Model Type=Open-source...
2025.11
50.4
InternVL2.5
Model Type=Open-source...
2025.11
50.3
Qwen3-VL
Model Type=Open-source...
2025.11
48.7
MiniCPM-V2.6
Model Type=Open-source...
2025.11
46.7
LLaVA-OV
Model Type=Open-source...
2025.11
44.7
Feedback
Search any
task
Search any
task