Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on HumanEval compile (L1)
Loading...
0.1847
Pass@1
ShieldedCode
0.026516
0.067583
0.10865
0.149717
Jan 28, 2026
Pass@1
Pass@10
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1
Pass@10
ShieldedCode
2026.01
0.1847
0.2794
GPT-4o
2026.01
0.1743
0.2518
DeepSeekCoder-7B
Parameters=7B
2026.01
0.0689
0.1065
GPT-3.5-Turbo
2026.01
0.0571
0.0795
Meta LLMCompiler-7B
Parameters=7B
2026.01
0.0538
0.0689
Qwen-2.5-Coder-7B
Parameters=7B
2026.01
0.0512
0.0728
StarCoder2-7B
Parameters=7B
2026.01
0.0491
0.0473
CodeLlama
2026.01
0.0326
0.0534
Feedback
Search any
task
Search any
task