Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Software Engineering on SWE-Bench-Verified (50 cases)
Loading...
72
Accuracy
SWE-agent + Claude 3.7 Sonnet w/ Review Heavy
7.52
24.26
41
57.74
May 16, 2025
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
SWE-agent + Claude 3.7 Sonnet w/ Review Heavy
Open/Close Source=Open...
2025.05
72
CodeStory Midwit Agent + swe-search
Open/Close Source=Clos...
2025.05
70
Openhands_04_15
Open/Close Source=Open...
2025.05
68
InfantAgent-Next + Claude-3.7-Sonnet
Open/Close Source=Open...
2025.05
66
AgentScope
Open/Close Source=Clos...
2025.05
66
CORTEXA
Open/Close Source=Clos...
2025.05
62
Amazon Q Developer Agent_2024_12_02
Open/Close Source=Clos...
2025.05
54
AutoCodeRover-v2.0 (Claude-3.5-Sonnet-20241022)
Open/Close Source=Open...
2025.05
52
devlo_2024_11_08
Open/Close Source=Clos...
2025.05
48
SWE-agent + SWE-agent-LM-32B
Open/Close Source=Open...
2025.05
46
AppMap Navie v2
Open/Close Source=Open...
2025.05
12
Agentless Lite + O3 Mini (20250214)
Open/Close Source=Open...
2025.05
10
Feedback
Search any
task
Search any
task