Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on HumanEval compile (L0)
Loading...
26.95
Pass@1
ShieldedCode
4.4444
10.2872
16.13
21.9728
Jan 28, 2026
Pass@1
Pass@10
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1
Pass@10
ShieldedCode
2026.01
26.95
35.68
GPT-4o
2026.01
22.58
31.47
DeepSeekCoder-7B
Parameters=7B
2026.01
10.28
14.23
CodeLlama
2026.01
7.84
9.21
GPT-3.5-Turbo
2026.01
6.89
10.18
Meta LLMCompiler-7B
Parameters=7B
2026.01
6.42
7.64
StarCoder2-7B
Parameters=7B
2026.01
5.78
9.45
Qwen-2.5-Coder-7B
Parameters=7B
2026.01
5.31
7.12
Feedback
Search any
task
Search any
task