Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on HumanEval (pass@1, pass@5)
Loading...
53.05
Pass@1
CAA
20.758
29.1415
37.525
45.9085
May 5, 2025
May 30, 2025
Jun 24, 2025
Jul 20, 2025
Aug 14, 2025
Sep 8, 2025
Oct 4, 2025
Pass@1
Pass@5
Updated 16d ago
Evaluation Results
Method
Method
Links
Pass@1
Pass@5
CAA
Backbone=Llama-3.2-3B-...
2025.10
53.05
59.21
RS
Backbone=Llama-3.2-3B-...
2025.10
52.13
55.78
STaR
Backbone=Llama-3.2-3B-...
2025.10
52.13
57.35
FA
Backbone=Llama-3.2-3B-...
2025.10
48.17
55.95
No Watermark
Backbone Model=Starcoder
2025.05
43
-
Unbiased
Backbone Model=Starcoder
2025.05
36
-
DiPmark
Backbone Model=Starcoder
2025.05
36
-
ToT
Backbone=Llama-3.2-3B-...
2025.10
35.73
49.51
E2E-LLM-Watermark
Backbone Model=Starcoder
2025.05
34
-
Unigram
Backbone Model=Starcoder
2025.05
33
-
Base Model
Backbone=Llama-3.2-3B-...
2025.10
27.44
39.02
KGW
Backbone Model=Starcoder
2025.05
22
-
Feedback
Search any
task
Search any
task