Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SWE-rebench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Software EngineeringSWE-rebench January 2026 (test)
Resolved Rate52.9
8
Software Issue ResolutionSWE-rebench 60-task Python subset v2
Pass@136.11
7
Software Engineering TasksSWE-rebench subset V2 (test)
Resolved Rate43.7
4
Software Issue ResolutionSWE-rebench full Python v2
Pass@122.36
1
Showing 4 of 4 rows