Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Coding Reasoning on HumanEval
Loading...
96.34
Accuracy (%)
COPT
92.5336
93.5218
94.51
95.4982
May 19, 2026
Accuracy (%)
Token Count
Updated 14d ago
Evaluation Results
Method
Method
Links
Accuracy (%)
Token Count
COPT
Backbone=Qwen3-8B, Rea...
2026.05
96.34
1,842
COPT
Backbone=Qwen3-8B, Rea...
2026.05
94.51
1,023
CoT (Greedy)
Backbone=Qwen3-8B
2026.05
93.9
2,627
CoT
Backbone=Qwen3-8B
2026.05
92.68
2,368
Feedback
Search any
task
Search any
task