Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on HumanEval compile (L2)
Loading...
19.23
Pass@1
ShieldedCode
4.2956
8.1728
12.05
15.9272
Jan 28, 2026
Pass@1
Pass@10
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1
Pass@10
ShieldedCode
2026.01
19.23
29.71
GPT-4o
2026.01
15.26
22.36
DeepSeekCoder-7B
Parameters=7B
2026.01
7.94
12.18
StarCoder2-7B
Parameters=7B
2026.01
6.23
6.82
Qwen-2.5-Coder-7B
Parameters=7B
2026.01
6.08
5.94
Meta LLMCompiler-7B
Parameters=7B
2026.01
5.97
7.28
CodeLlama
2026.01
5.19
7.89
GPT-3.5-Turbo
2026.01
4.87
6.74
Feedback
Search any
task
Search any
task