Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on HumanEval Llama-3-70B (test)
Loading...
26.3
QD-Score
QD-LLM
13.3
16.675
20.05
23.425
May 10, 2026
QD-Score
Median Score
Coverage
Accuracy (A)
Updated 22d ago
Evaluation Results
Method
Method
Links
QD-Score
Median Score
Coverage
Accuracy (A)
QD-LLM
Backbone=Llama-3-70B-I...
2026.05
26.3
26.2
41
94
CMA-ME (ad.)
Backbone=Llama-3-70B-I...
2026.05
19.8
19.7
30
-
QDAIF
Backbone=Llama-3-70B-I...
2026.05
18.6
18.5
28
-
EvoPrompt
Backbone=Llama-3-70B-I...
2026.05
17.2
17.1
21
-
Best-of-N+MMR
Backbone=Llama-3-70B-I...
2026.05
16.4
16.3
24
-
Diverse Beam
Backbone=Llama-3-70B-I...
2026.05
15.1
15
21
-
Nucleus Samp.
Backbone=Llama-3-70B-I...
2026.05
14.2
14.1
19
-
Vanilla ME
Backbone=Llama-3-70B-I...
2026.05
13.8
13.7
18
-
Feedback
Search any
task
Search any
task