Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Outcome Reasoning on CRASS

92.1M' (F1 Mean)

GPT-5

69.2275.1681.187.04May 17, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.05
92.188
2025.05
90.586.2
2025.05
84.979.5
2025.05
82.977.1
2025.05
81.775.2
2025.05
80.573.9
2025.05
70.163.5