Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Software Engineering on SWE-Bench (val)
Loading...
28.8
Acc
Claude 3.7 Sonnet
-1.152
6.624
14.4
22.176
Feb 6, 2026
Acc
Completion
Updated 4d ago
Evaluation Results
Method
Method
Links
Acc
Completion
Claude 3.7 Sonnet
Model Size=~100B+, Tra...
2026.02
28.8
77.5
DeepSeek-R1
Model Size=37B, Traini...
2026.02
8.8
30
Rubric-Augmented Classifier
Model Size=4B, Trainin...
2026.02
2.5
20
Claude 3.5 Haiku
Model Size=~20B+, Trai...
2026.02
1.3
2.5
Mistral-7B
Model Size=7B, Trainin...
2026.02
0
15
Qwen3-4B (No RL)
Model Size=4B, Trainin...
2026.02
0
0
Baseline Classifier
Model Size=4B, Trainin...
2026.02
0
2.5
Feedback
Search any
task
Search any
task