Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Outcome Reasoning on Open-Critic

75.3M' (F1 Mean)

GPT-5

44.6252.58560.5568.515May 17, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.05
75.369.4
2025.05
73.867.5
2025.05
61.554.7
2025.05
57.250.6
2025.05
5649.4
2025.05
53.747.3
2025.05
45.839.2