Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Software Engineering on SWE-bench
Loading...
82.4
Resolve Rate
AgentOrchestra
-3.296
18.952
41.2
63.448
Mar 22, 2026
Mar 29, 2026
Apr 6, 2026
Apr 13, 2026
Apr 21, 2026
Apr 28, 2026
May 6, 2026
Resolve Rate
Updated 27d ago
Evaluation Results
Method
Method
Links
Resolve Rate
AgentOrchestra
2026.05
82.4
Uno-Orchestra
2026.05
81.8
Qwen3.5-122B-A10B
Framework/Scaffold=Ope...
2026.04
67.4
Qwen3.5-122B-A10B
Framework/Scaffold=Ope...
2026.04
66.4
AOrchestra
2026.05
61.7
Qwen3.5-122B-A10B
Framework/Scaffold=Codex
2026.04
61.2
Nemotron 3 Super
Framework/Scaffold=Ope...
2026.04
60.47
Nemotron 3 Super
Framework/Scaffold=Ope...
2026.04
59.2
Devstral Small 2
Model size=24B, Model...
2026.04
56.4
Devin
Context=Best Published...
2026.03
55
Nemotron 3 Super
Framework/Scaffold=Codex
2026.04
53.73
GPT-OSS-120B
Framework/Scaffold=Ope...
2026.04
41.9
gpt-oss-20b
Model size=20B, Model...
2026.04
26
xRouter
2026.05
24.8
ColdStart-LLM
2026.05
14.6
ARYA
Parameters=0
2026.03
0
Feedback
Search any
task
Search any
task