Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on HumanEval (pass@1, Final Gap)
Loading...
90.3
pass@1
Pioneer Agent
71.788
76.594
81.4
86.206
Apr 10, 2026
pass@1
Final Gap
Updated 4d ago
Evaluation Results
Method
Method
Links
pass@1
Final Gap
Pioneer Agent
Model=Qwen3-8B, System...
2026.04
90.3
17.8
Pioneer Agent
Model=Qwen3-8B, System...
2026.04
89
17.8
Naive Baseline
Model=Qwen3-8B, System...
2026.04
74.4
17.8
Naive Baseline
Model=Qwen3-8B, System...
2026.04
72.5
17.8
Feedback
Search any
task
Search any
task