Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Software Issue Resolution on SWE-rebench 60-task Python subset v2
Loading...
36.11
Pass@1
Claude Opus-4.5
7.8012
15.1506
22.5
29.8494
May 14, 2026
Pass@1
Pass@3
Updated 19d ago
Evaluation Results
Method
Method
Links
Pass@1
Pass@3
Claude Opus-4.5
Evaluation Harness=min...
2026.05
36.11
36.67
GLM-4.7
Evaluation Harness=min...
2026.05
27.22
31.67
MiniMax-M2.1
Evaluation Harness=min...
2026.05
26.11
31.67
Gemini
Evaluation Harness=min...
2026.05
25.56
33.33
DeepSeek-V3.2
Evaluation Harness=min...
2026.05
23.33
31.67
GPT-5.2
Evaluation Harness=min...
2026.05
20.56
23.33
gpt-oss-120b
Evaluation Harness=min...
2026.05
8.89
16.67
Feedback
Search any
task
Search any
task