Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Code Evaluation on SWE Verified Agentless
Loading...
57.6
pass@1
DeepSeek-R1 0528 671B
19.016
29.033
39.05
49.067
Dec 15, 2025
pass@1
Updated 4d ago
Evaluation Results
Method
Method
Links
pass@1
DeepSeek-R1 0528 671B
Parameters=671B, Think...
2025.12
57.6
Nemotron-Cascade 14B-Thinking (Test-Time Scaling)
Parameters=14B, Thinki...
2025.12
53.8
Gemini-2.5 Flash-Thinking
Thinking Mode=true, Se...
2025.12
48.9
Nemotron Cascade-8B (Test-Time Scaling)
Parameters=8B, Thinkin...
2025.12
43.6
Nemotron-Cascade 14B-Thinking
Parameters=14B, Thinki...
2025.12
43.1
Nemotron Cascade-8B
Parameters=8B, Thinkin...
2025.12
37.2
Qwen3 14B
Parameters=14B, Setup=...
2025.12
27.4
Qwen3 8B
Parameters=8B, Setup=A...
2025.12
20.5
Feedback
Search any
task
Search any
task