Share your thoughts, 1 month free Claude Pro on usSee more

Outcome Reasoning on CRASS

92.1M' (F1 Mean)

GPT-5

Updated 5mo ago

Evaluation Results

Method	Links
GPT-5 2025.05		92.1	88
GPT-o4 2025.05		90.5	86.2
Llama4-M 2025.05		84.9	79.5
DeepSeek 2025.05		82.9	77.1
Gemini2.5 2025.05		81.7	75.2
Qwen3 2025.05		80.5	73.9
Llama4-S 2025.05		70.1	63.5