Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Outcome Reasoning on Arithmetic

87.8M' F1 Mean

GPT-5

57.95265.70173.4581.199May 17, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.05
87.882.7
2025.05
85.880.9
2025.05
76.370.6
2025.05
7467.5
2025.05
72.165.4
2025.05
69.863.2
2025.05
59.152.4