Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on EditBench
Loading...
53.7
Pass@1
GPT o4-mini
46.9608
48.7104
50.46
52.2096
Mar 13, 2026
Pass@1
Consistency
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1
Consistency
GPT o4-mini
2026.03
53.7
83.33
SEMREP
2026.03
53.7
88.89
QwQ-32B
Mode=Finetuned
2026.03
50.93
75
QwQ-32B
Mode=Base
2026.03
47.22
76.85
Feedback
Search any
task
Search any
task