Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on BigCodeBench-I Hard
Loading...
28.4
Score
GPT-o1
17.792
20.546
23.3
26.054
Sep 26, 2025
Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Score
GPT-o1
Backbone=GPT-o1
2025.09
28.4
DeepSeek-V2.5-238B
Backbone=DeepSeek-V2.5...
2025.09
27
CRITIQUE-CODER
Backbone=Qwen3-8B, Thi...
2025.09
27
Qwen3-8B-RL
Backbone=Qwen3-8B, Thi...
2025.09
24.3
Baseline (Qwen3-8B)
Backbone=Qwen3-8B, Thi...
2025.09
23.6
Qwen3-4B-RL
Backbone=Qwen3-4B, Thi...
2025.09
23
CRITIQUE-CODER
Backbone=Qwen3-4B, Thi...
2025.09
23
DeepSeek-R1-Distill-14B
Backbone=DeepSeek-R1-D...
2025.09
20.9
Baseline (Qwen3-4B)
Backbone=Qwen3-4B, Thi...
2025.09
20.9
AceCoder-7B
Backbone=AceCoder-7B
2025.09
19.6
DeepCoder-14B
Backbone=DeepCoder-14B
2025.09
18.2
Feedback
Search any
task
Search any
task