Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Outcome Reasoning on CLOMO
Loading...
90.2
M' F1 Mean
GPT-5
66.28
72.49
78.7
84.91
May 17, 2025
M' F1 Mean
Y' F1 Mean
Updated 4d ago
Evaluation Results
Method
Method
Links
M' F1 Mean
Y' F1 Mean
GPT-5
Model=GPT-5
2025.05
90.2
85.3
GPT-o4
Model=GPT-o4
2025.05
88.7
83.9
Llama4-M
Model=Llama4-M
2025.05
82.9
77.2
DeepSeek
Model=DeepSeek
2025.05
80.5
74.3
Gemini2.5
Model=Gemini2.5
2025.05
79.3
72.8
Qwen3
Model=Qwen3
2025.05
77.8
71.6
Llama4-S
Model=Llama4-S
2025.05
67.2
60.9
Feedback
Search any
task
Search any
task