Share your thoughts, 1 month free Claude Pro on usSee more

Outcome Reasoning on COCO

77.8M' (F1 Mean)

GPT-5

Updated 5mo ago

Evaluation Results

Method	Links
GPT-5 2025.05		77.8	70.1
GPT-o4 2025.05		75.3	66.9
Llama4-M 2025.05		65.4	58.7
DeepSeek 2025.05		61.3	54.6
Qwen3 2025.05		60.1	53.4
Gemini2.5 2025.05		57.8	51.5
Llama4-S 2025.05		49.3	42.7