Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Outcome Reasoning on CVQA-Bool
Loading...
81.2
M' (F1 Score)
GPT-5
53.744
60.872
68
75.128
May 17, 2025
M' (F1 Score)
Y' (F1 Score)
Updated 4d ago
Evaluation Results
Method
Method
Links
M' (F1 Score)
Y' (F1 Score)
GPT-5
Model=GPT-5
2025.05
81.2
74.5
GPT-o4
Model=GPT-o4
2025.05
77.1
70.2
Llama4-M
Model=Llama4-M
2025.05
70.9
64.3
DeepSeek
Model=DeepSeek
2025.05
66.7
59.8
Qwen3
Model=Qwen3
2025.05
65.4
58.6
Gemini2.5
Model=Gemini2.5
2025.05
63.2
56.8
Llama4-S
Model=Llama4-S
2025.05
54.8
48.1
Feedback
Search any
task
Search any
task