Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SWE-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Software Engineering Task ResolutionSWE-bench Verified
Resolution Rate73.3
63
Agentic CodingSWE-bench Verified
Percentage Resolved80.8
56
Automated Software EngineeringSWE-bench Verified
Resolved Rate1,770
39
Software EngineeringSWE-bench Lite
Speedup4.66
36
Issue ResolutionSWE-bench Verified (test)
Pass Rate77.2
36
Software EngineeringSWE-bench Verified
Accuracy62.6
33
Software EngineeringSWE-bench verified (All)
Success Rate93.8
32
Software EngineeringSWE-bench Verified
Resolution Rate83.8
32
Software EngineeringSWE-bench Verified
Success Rate71.8
31
Software Engineering Agent TaskSWE-Bench Pro
Pass@3100
28
Software Engineering Issue ResolutionSWE-bench Verified
Resolution Rate67.5
26
Function-level Code LocalizationSWE-bench Live Lite
Acc@174.8
25
File-level Code LocalizationSWE-bench Live Lite
Acc@182.1
25
Function-level Code LocalizationSWE-bench Verified (Lite)
Acc@183.4
25
File-level Code LocalizationSWE-bench Verified Lite
Accuracy@191.9
25
Code LocalizationSWE-bench Verified (test)
File Precision86.38
24
Software EngineeringSWE-Bench Verified
Pass Rate72
20
Software EngineeringSWE-Bench Multilingual 1.0 (test)
Resolution Rate75.2
20
Software Engineering / Issue ResolvingSWE-bench Verified
Pass@166
19
Automated Software EngineeringSWE-bench Lite
Resolve Rate33
19
Software Engineering Task CompletionSWE-bench
S@50 Success Rate90.2
18
Software Engineering Problem SolvingSWE-Bench C#
Resolve Rate47.3
18
Software engineeringSWE-Bench Verified
Pass@184
18
Agentic Uncertainty ElicitationSWE-bench Pro (test)
AUROC0.68
18
Agentic CodingSWE-Bench Verified
Pass@179.6
17
Showing 25 of 114 rows