Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Outcome Reasoning on Arithmetic

87.8M' F1 Mean

GPT-5

57.95265.70173.4581.199May 17, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.05
87.882.7
2025.05
85.880.9
2025.05
76.370.6
2025.05
7467.5
2025.05
72.165.4
2025.05
69.863.2
2025.05
59.152.4