Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on HumanEval compile (L3)
Loading...
14.71
Pass@1
ShieldedCode
2.3132
5.5316
8.75
11.9684
Jan 28, 2026
Pass@1
Pass@10
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1
Pass@10
ShieldedCode
2026.01
14.71
22.83
GPT-4o
2026.01
11.89
18.99
DeepSeekCoder-7B
Parameters=7B
2026.01
6.17
8.9
Meta LLMCompiler-7B
Parameters=7B
2026.01
5.39
7
StarCoder2-7B
Parameters=7B
2026.01
5.32
6.2
Qwen-2.5-Coder-7B
Parameters=7B
2026.01
4.89
6.06
GPT-3.5-Turbo
2026.01
4.29
4.41
CodeLlama
2026.01
2.79
4.56
Feedback
Search any
task
Search any
task