Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Agentic Coding on SWE-bench Verified
Loading...
77.2
Percentage Resolved
Claude-Sonnet-4.5
6.896
25.148
43.4
61.652
Dec 4, 2025
Dec 14, 2025
Dec 25, 2025
Jan 4, 2026
Jan 15, 2026
Jan 25, 2026
Feb 5, 2026
Percentage Resolved
Updated 4d ago
Evaluation Results
Method
Method
Links
Percentage Resolved
Claude-Sonnet-4.5
Model Type=Proprietary...
2025.12
77.2
GPT-5
Model Type=Proprietary...
2025.12
74.9
Kimi-K2-thinking
Model Type=Open Source...
2025.12
71.3
DeepSeek-V3.1-Nex-N1
Model Type=Open Source...
2025.12
70.6
Minimax-M2
Model Type=Open Source...
2025.12
69.4
GLM-4.6
Model Type=Open Source...
2025.12
68
DeepSeek-V3.1
Model Type=Open Source...
2025.12
66
Gemini-2.5-pro
Model Type=Proprietary...
2025.12
59.6
Qwen3-32B-Nex-N1
Model Type=Open Source...
2025.12
50.5
Qwen3-30B-A3B-Nex-N1
Model Type=Open Source...
2025.12
29.7
InternLM3-8B-Nex-N1
Model Type=Open Source...
2025.12
20.3
ALIVE-Oracle
Backbone=Qwen3-30B-Ins...
2026.02
17.6
ALIVE-Self
Backbone=Qwen3-30B-Ins...
2026.02
17.2
GRPO (Scalar Reward)
Backbone=Qwen3-30B-Ins...
2026.02
14.8
FCP (Verbal Only)
Backbone=Qwen3-30B-Ins...
2026.02
14
SFT
Backbone=Qwen3-30B-Ins...
2026.02
13.6
Qwen3-32B
Model Type=Open Source...
2025.12
12.9
Base Model
Backbone=Qwen3-30B-Ins...
2026.02
11.8
Qwen3-30B-A3B
Model Type=Open Source...
2025.12
9.6
Feedback
Search any
task
Search any
task