Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Outcome Reasoning on CVQA Count
Loading...
0.792
F1 Mean (M')
GPT-5
0.506
0.58025
0.6545
0.72875
May 17, 2025
F1 Mean (M')
F1 Mean (Y')
Updated 4d ago
Evaluation Results
Method
Method
Links
F1 Mean (M')
F1 Mean (Y')
GPT-5
Model=GPT-5
2025.05
0.792
0.72
GPT-o4
Model=GPT-o4
2025.05
0.762
0.693
Llama4-M
Model=Llama4-M
2025.05
0.675
0.61
DeepSeek
Model=DeepSeek
2025.05
0.635
0.569
Qwen3
Model=Qwen3
2025.05
0.622
0.557
Gemini2.5
Model=Gemini2.5
2025.05
0.601
0.538
Llama4-S
Model=Llama4-S
2025.05
0.517
0.452
Feedback
Search any
task
Search any
task