Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SWE-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Automated Software EngineeringSWE-bench Verified
Resolved Rate1,770
39
Issue ResolutionSWE-bench Verified (test)
Pass Rate77.2
36
Software EngineeringSWE-bench Verified
Accuracy62.6
33
Agentic CodingSWE-bench Verified
Percentage Resolved77.2
33
Software EngineeringSWE-bench Verified
Success Rate71.8
29
Software EngineeringSWE-bench Verified
Resolution Rate83.8
26
Function-level Code LocalizationSWE-bench Live Lite
Acc@174.8
25
File-level Code LocalizationSWE-bench Live Lite
Acc@182.1
25
Function-level Code LocalizationSWE-bench Verified (Lite)
Acc@183.4
25
File-level Code LocalizationSWE-bench Verified Lite
Accuracy@191.9
25
Code LocalizationSWE-bench Verified (test)
File Precision86.38
24
Software Engineering Task ResolutionSWE-bench Verified
Resolution Rate57.4
23
Software EngineeringSWE-Bench Multilingual 1.0 (test)
Resolution Rate75.2
20
Automated Software EngineeringSWE-bench Lite
Resolve Rate33
19
Software engineeringSWE-Bench Verified
Pass@184
18
Agentic Uncertainty ElicitationSWE-bench Pro (test)
AUROC0.68
18
Function-level LocalizationSWE-Bench Lite latest (test)
NDCG@564.34
16
Module-level LocalizationSWE-Bench-Lite latest (test)
NDCG@577.73
16
File-level LocalizationSWE-Bench-Lite latest (test)
NDCG@177.74
16
Function-level Code LocalizationSWE-bench lite
Acc@573.36
16
Module-level Code LocalizationSWE-bench lite
Acc@586.5
16
File-level Code LocalizationSWE-bench lite
Acc@177.74
16
Software Engineering Issue ResolutionSWE-Bench Lite
Resolution Rate73.5
16
Code GenerationSWE-bench Lite
GF Precision70
14
Software EngineeringSWE-Bench Pro 1.0 (test)
Resolved Rate51.6
14
Showing 25 of 71 rows