Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Software Engineering Automation on SWE Multilingual
Loading...
70.2
Resolved
DeepSeek-V3.2
54.704
58.727
62.75
66.773
Dec 2, 2025
Resolved
Updated 4d ago
Evaluation Results
Method
Method
Links
Resolved
DeepSeek-V3.2
thinking mode=true
2025.12
70.2
Claude-4.5-Sonnet
2025.12
68
Kimi-K2
thinking mode=true
2025.12
61.1
MiniMax M2
2025.12
56.5
GPT-5 High
2025.12
55.3
Feedback
Search any
task
Search any
task