Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on HumanEval+ (Score)
Loading...
34.76
Score
ARES-RL
30.9536
31.9418
32.93
33.9182
May 22, 2026
Score
Updated 9d ago
Evaluation Results
Method
Method
Links
Score
ARES-RL
Backbone=Qwen3-32B
2026.05
34.76
Webscale
Backbone=Qwen3-32B
2026.05
33.54
NaturalReasoning
Backbone=Qwen3-32B
2026.05
32.93
CPT
Backbone=Qwen3-32B
2026.05
32.32
ARES-SFT
Backbone=Qwen3-32B
2026.05
31.1
Feedback
Search any
task
Search any
task