Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on BigCodeBench-I Full
Loading...
50.4
Score
GPT-o1
37.608
40.929
44.25
47.571
Sep 26, 2025
Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Score
GPT-o1
Backbone=GPT-o1
2025.09
50.4
DeepSeek-V2.5-238B
Backbone=DeepSeek-V2.5...
2025.09
48.9
CRITIQUE-CODER
Backbone=Qwen3-8B, Thi...
2025.09
46.6
Baseline (Qwen3-8B)
Backbone=Qwen3-8B, Thi...
2025.09
44.6
Qwen3-8B-RL
Backbone=Qwen3-8B, Thi...
2025.09
44.5
AceCoder-7B
Backbone=AceCoder-7B
2025.09
43.3
CRITIQUE-CODER
Backbone=Qwen3-4B, Thi...
2025.09
43.1
Baseline (Qwen3-4B)
Backbone=Qwen3-4B, Thi...
2025.09
42
Qwen3-4B-RL
Backbone=Qwen3-4B, Thi...
2025.09
40.6
DeepCoder-14B
Backbone=DeepCoder-14B
2025.09
38.2
DeepSeek-R1-Distill-14B
Backbone=DeepSeek-R1-D...
2025.09
38.1
Feedback
Search any
task
Search any
task