Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Outcome Reasoning on HumanEval Exe

75.7M' (F1 Mean)

GPT-5

46.5854.1461.769.26May 17, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.05
75.771.5
2025.05
73.466.5
2025.05
63.656.9
2025.05
59.452.7
2025.05
58.251.5
2025.05
55.849.4
2025.05
47.741.2