Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Step-wise Verification on WeMath
Loading...
63.9
Macro F1
TIM-PRM
50.276
53.813
57.35
60.887
Nov 28, 2025
Macro F1
Updated 2d ago
Evaluation Results
Method
Method
Links
Macro F1
TIM-PRM
Model Type=Open-source...
2025.11
63.9
GPT-4o
Model Type=Proprietary
2025.11
63.3
Qwen3-VL
Model Type=Open-source...
2025.11
62.7
Qwen2.5-VL
Model Type=Open-source...
2025.11
62.3
Gemini-2.0-Flash
Model Type=Proprietary
2025.11
58.7
GPT-4o-Mini
Model Type=Proprietary
2025.11
58.5
Qwen3-VL
Model Type=Open-source...
2025.11
58.3
TIM-PRM
Model Type=Open-source...
2025.11
58
MiniCPM-V2.6
Model Type=Open-source...
2025.11
57.4
LLaVA-OV
Model Type=Open-source...
2025.11
57.3
MM-PRM
Model Type=Open-source...
2025.11
56.5
VisualPRM
Model Type=Open-source...
2025.11
55.1
Qwen2.5-VL
Model Type=Open-source...
2025.11
54.2
LLaVA-OV
Model Type=Open-source...
2025.11
52.5
InternVL2.5
Model Type=Open-source...
2025.11
52.5
InternVL2.5
Model Type=Open-source...
2025.11
52.5
InternVL2.5
Model Type=Open-source...
2025.11
51.4
InternVL2.5
Model Type=Open-source...
2025.11
50.8
Feedback
Search any
task
Search any
task