Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on HumanEval (pass@2, pass@4, pass@8)
Loading...
59.15
Pass@2
DLE (top-p+top-k)
45.8276
49.2863
52.745
56.2037
Apr 22, 2026
Pass@2
Pass@4
Pass@8
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@2
Pass@4
Pass@8
DLE (top-p+top-k)
Model=Llama3.2-3B-Inst...
2026.04
59.15
67.07
76.83
DLE (min-p)
Model=Llama3.2-3B-Inst...
2026.04
59.15
68.29
76.83
DLE (ε-sampling)-PROBFIRST
Model=Llama3.2-3B-Inst...
2026.04
59.15
69.51
78.66
DLE (ε-sampling)-RANDBRANCH
Model=Llama3.2-3B-Inst...
2026.04
57.93
65.24
73.17
Self-consistency (ε-sampling)
Model=Llama3.2-3B-Inst...
2026.04
55.49
64.46
75.61
DLE (ε-sampling)-DIVFIRST
Model=Llama3.2-3B-Inst...
2026.04
55.49
65.24
71.95
Self-consistency (min-p)
Model=Llama3.2-3B-Inst...
2026.04
51.83
65.85
76.83
Self-consistency (top-p+top-k)
Model=Llama3.2-3B-Inst...
2026.04
49.39
64.02
73.78
Self-consistency
Model=Llama3.2-3B-Inst...
2026.04
46.34
57.93
70.12
Feedback
Search any
task
Search any
task