Share your thoughts, 1 month free Claude Pro on usSee more

Outcome Reasoning on CLOMO

90.2M' F1 Mean

GPT-5

Updated 5mo ago

Evaluation Results

Method	Links
GPT-5 2025.05		90.2	85.3
GPT-o4 2025.05		88.7	83.9
Llama4-M 2025.05		82.9	77.2
DeepSeek 2025.05		80.5	74.3
Gemini2.5 2025.05		79.3	72.8
Qwen3 2025.05		77.8	71.6
Llama4-S 2025.05		67.2	60.9