Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Editing on EditBench Core set
Loading...
66.67
Pass@1
Claude Sonnet 4
16.594
29.5945
42.595
55.5955
Mar 13, 2026
Pass@1
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1
Claude Sonnet 4
Model Weight Type=Clos...
2026.03
66.67
GPT o4-mini
Model Weight Type=Clos...
2026.03
57.41
SEMREP QwQ-32B
Model Weight Type=Open...
2026.03
57.41
Qwen3-Coder
Model Weight Type=Open...
2026.03
55.56
GLM-4.6
Model Weight Type=Open...
2026.03
55.56
Gemini 2.5 Pro
Model Weight Type=Clos...
2026.03
54.63
Finetuned QwQ-32B
Model Weight Type=Open...
2026.03
53.7
Qwen2.5-72B-Instruct
Model Weight Type=Open...
2026.03
53.7
QwQ-32B
Model Weight Type=Open...
2026.03
50.93
SEMREP Qwen2.5-Coder-7B
Model Weight Type=Open...
2026.03
32.41
gemma-3-12b-it
Model Weight Type=Open...
2026.03
23.15
Finetuned Qwen2.5-Coder-7B
Model Weight Type=Open...
2026.03
23.15
Qwen2.5-Coder-7B
Model Weight Type=Open...
2026.03
18.52
Feedback
Search any
task
Search any
task