Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Programming on HumanEval (test)
Loading...
79.5
ACC1
o1-mini
24.068
38.459
52.85
67.241
Mar 15, 2023
Jun 30, 2023
Oct 16, 2023
Jan 31, 2024
May 18, 2024
Sep 2, 2024
Dec 19, 2024
ACC1
Delta ACC
√ → X
Updated 4d ago
Evaluation Results
Method
Method
Links
ACC1
Delta ACC
√ → X
o1-mini
Model Family=ChatGPT
2024.12
79.5
4.3
14.8
4o
Model Family=ChatGPT
2024.12
72.6
6.8
21.9
GPT-4
number_of_shots=0-shot
2023.03
67
-
-
CodeT + GPT-3.5
benchmark-specific tun...
2023.03
65.8
-
-
FP16
Bits=16-bit, Backbone=...
2023.11
57.31
-
-
AFPQ (NF3-asym)
Bits=3-bit, Quantizati...
2023.11
52.43
-
-
3.5-turbo
Model Family=ChatGPT
2024.12
50.9
10.6
28.3
GPT-3.5
number_of_shots=0-shot
2023.03
48.1
-
-
AWQ (INT3)
Bits=3-bit, Quantizati...
2023.11
47.56
-
-
AWQ (NF3-sym)
Bits=3-bit, Quantizati...
2023.11
45.12
-
-
PaLM
number_of_shots=0-shot
2023.03
26.2
-
-
Feedback
Search any
task
Search any
task