Share your thoughts, 1 month free Claude Pro on usSee more

Outcome Reasoning on CVQA Count

0.792F1 Mean (M')

GPT-5

Updated 5mo ago

Evaluation Results

Method	Links
GPT-5 2025.05		0.792	0.72
GPT-o4 2025.05		0.762	0.693
Llama4-M 2025.05		0.675	0.61
DeepSeek 2025.05		0.635	0.569
Qwen3 2025.05		0.622	0.557
Gemini2.5 2025.05		0.601	0.538
Llama4-S 2025.05		0.517	0.452