Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Programming on HumanEval
Loading...
79.5
ACC1
ChatGPT o1-mini
49.756
57.478
65.2
72.922
Dec 19, 2024
ACC1
ΔACC
Execution Failure Rate (%)
Updated 4d ago
Evaluation Results
Method
Method
Links
ACC1
ΔACC
Execution Failure Rate (%)
ChatGPT o1-mini
Model=o1-mini
2024.12
79.5
4.3
14.8
ChatGPT 4o
Model=4o
2024.12
72.6
6.8
21.9
ChatGPT 3.5-turbo
Model=3.5-turbo
2024.12
50.9
10.6
28.3
Feedback
Search any
task
Search any
task