Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Coding on HumanEval+ (test)
Loading...
67.7
Pass@1
Base
61.98
63.465
64.95
66.435
May 6, 2026
Pass@1
Updated 27d ago
Evaluation Results
Method
Method
Links
Pass@1
Base
Backbone=Qwen2.5-3B-In...
2026.05
67.7
KL-SFT
Backbone=Qwen2.5-3B-In...
2026.05
67.1
STM
Backbone=Qwen2.5-3B-In...
2026.05
67.1
Low-SFT
Backbone=Qwen2.5-3B-In...
2026.05
65.9
DFT
Backbone=Qwen2.5-3B-In...
2026.05
65.9
Iter-SFT
Backbone=Qwen2.5-3B-In...
2026.05
65.9
Anchored Learning
Backbone=Qwen2.5-3B-In...
2026.05
64.6
Self-SFT
Backbone=Qwen2.5-3B-In...
2026.05
64
SFT
Backbone=Qwen2.5-3B-In...
2026.05
62.2
Feedback
Search any
task
Search any
task