Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SWE-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Automated Software EngineeringSWE-bench Verified
Resolved Rate1,770
39
Issue ResolutionSWE-bench Verified (test)
Pass Rate77.2
36
Function-level Code LocalizationSWE-bench Live Lite
Acc@174.8
25
File-level Code LocalizationSWE-bench Live Lite
Acc@182.1
25
Function-level Code LocalizationSWE-bench Verified (Lite)
Acc@183.4
25
File-level Code LocalizationSWE-bench Verified Lite
Accuracy@191.9
25
Code LocalizationSWE-bench Verified (test)
File Precision86.38
24
Automated Software EngineeringSWE-bench Lite
Resolve Rate33
19
Agentic CodingSWE-bench Verified
Percentage Resolved77.2
19
Software engineeringSWE-Bench Verified
Pass@184
18
Agentic Uncertainty ElicitationSWE-bench Pro (test)
AUROC0.68
18
Software Engineering Task ResolutionSWE-bench Verified
Resolution Rate0.704
17
Function-level LocalizationSWE-Bench Lite latest (test)
NDCG@564.34
16
Module-level LocalizationSWE-Bench-Lite latest (test)
NDCG@577.73
16
File-level LocalizationSWE-Bench-Lite latest (test)
NDCG@177.74
16
Function-level Code LocalizationSWE-bench lite
Acc@573.36
16
Module-level Code LocalizationSWE-bench lite
Acc@586.5
16
File-level Code LocalizationSWE-bench lite
Acc@177.74
16
Software Engineering Issue ResolutionSWE-Bench Lite
Resolution Rate73.5
16
Code AgentSWE-Bench Verified
Score0.809
13
Software Engineering Task ResolutionSWE-BENCH LIVE
Resolution Rate24.7
11
Software EngineeringSWE-bench Verified
Resolution Rate0.402
9
Software EngineeringSWE-Bench Pro (public)
Resolve Rate (Pass@1)59
9
Issue ResolvingSWE-bench lite
Rounds5
9
Software Engineering Issue SolvingSWE-Bench Verified
Accuracy46
8
Showing 25 of 45 rows