Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on HumanEval (Performance %)
Loading...
71.9
Performance (%)
ATOM
62.4464
64.9007
67.355
69.8093
May 25, 2026
Performance (%)
Updated 7d ago
Evaluation Results
Method
Method
Links
Performance (%)
ATOM
Backbone Model=Meta-Ll...
2026.05
71.9
LLM-Debate
Backbone Model=Meta-Ll...
2026.05
71.07
Complete
Backbone Model=Meta-Ll...
2026.05
70.25
Random
Backbone Model=Meta-Ll...
2026.05
69.42
Star
Backbone Model=Meta-Ll...
2026.05
68.6
ARG-Designer
Backbone Model=Meta-Ll...
2026.05
68.6
CoT
Backbone Model=Meta-Ll...
2026.05
67.77
Chain
Backbone Model=Meta-Ll...
2026.05
66.12
G-Designer
Backbone Model=Meta-Ll...
2026.05
66.12
Vanilla
Backbone Model=Meta-Ll...
2026.05
65.29
AgentPrune
Backbone Model=Meta-Ll...
2026.05
62.81
AgentDropout
Backbone Model=Meta-Ll...
2026.05
62.81
Feedback
Search any
task
Search any
task