Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Software Engineering on SWE-Bench Pro 1.0 (test)
Loading...
51.6
Resolved Rate
Claude-Opus-4.5
31.632
36.816
42
47.184
Feb 28, 2026
Resolved Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Resolved Rate
Claude-Opus-4.5
Size=?, Scaffold=SWE-A...
2026.02
51.6
Claude-Sonnet-4.5
Size=?, Scaffold=SWE-A...
2026.02
50.5
Claude-Opus-4.5
Size=?, Scaffold=MiniS...
2026.02
50.2
Kimi-K2.5
Size=1000A32, Scaffold...
2026.02
47.3
DeepSeek-V3.2
Size=671A37, Scaffold=...
2026.02
46
GLM-4.7
Size=358A32, Scaffold=...
2026.02
45.1
Claude-Sonnet-4.5
Size=?, Scaffold=MiniS...
2026.02
43
Kimi-K2.5
Size=1000A32, Scaffold...
2026.02
42.8
Qwen3-Coder-Next
Size=80A3, Scaffold=SW...
2026.02
42.7
MiniMax-M2.1
Size=230A10, Scaffold=...
2026.02
40.8
GLM-4.7
Size=358A32, Scaffold=...
2026.02
39.4
MiniMax-M2.1
Size=230A10, Scaffold=...
2026.02
39.1
Qwen3-Coder-Next
Size=80A3, Scaffold=Mi...
2026.02
38.7
DeepSeek-V3.2
Size=671A37, Scaffold=...
2026.02
32.4
Feedback
Search any
task
Search any
task