| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Automated Software Engineering | SWE-bench Verified | Resolved Rate1,770 | 39 | |
| Issue Resolution | SWE-bench Verified (test) | Pass Rate77.2 | 36 | |
| Software Engineering | SWE-bench Verified | Accuracy62.6 | 33 | |
| Agentic Coding | SWE-bench Verified | Percentage Resolved77.2 | 33 | |
| Software Engineering | SWE-bench Verified | Success Rate71.8 | 29 | |
| Software Engineering | SWE-bench Verified | Resolution Rate83.8 | 26 | |
| Function-level Code Localization | SWE-bench Live Lite | Acc@174.8 | 25 | |
| File-level Code Localization | SWE-bench Live Lite | Acc@182.1 | 25 | |
| Function-level Code Localization | SWE-bench Verified (Lite) | Acc@183.4 | 25 | |
| File-level Code Localization | SWE-bench Verified Lite | Accuracy@191.9 | 25 | |
| Code Localization | SWE-bench Verified (test) | File Precision86.38 | 24 | |
| Software Engineering Task Resolution | SWE-bench Verified | Resolution Rate57.4 | 23 | |
| Software Engineering | SWE-Bench Multilingual 1.0 (test) | Resolution Rate75.2 | 20 | |
| Automated Software Engineering | SWE-bench Lite | Resolve Rate33 | 19 | |
| Software engineering | SWE-Bench Verified | Pass@184 | 18 | |
| Agentic Uncertainty Elicitation | SWE-bench Pro (test) | AUROC0.68 | 18 | |
| Function-level Localization | SWE-Bench Lite latest (test) | NDCG@564.34 | 16 | |
| Module-level Localization | SWE-Bench-Lite latest (test) | NDCG@577.73 | 16 | |
| File-level Localization | SWE-Bench-Lite latest (test) | NDCG@177.74 | 16 | |
| Function-level Code Localization | SWE-bench lite | Acc@573.36 | 16 | |
| Module-level Code Localization | SWE-bench lite | Acc@586.5 | 16 | |
| File-level Code Localization | SWE-bench lite | Acc@177.74 | 16 | |
| Software Engineering Issue Resolution | SWE-Bench Lite | Resolution Rate73.5 | 16 | |
| Code Generation | SWE-bench Lite | GF Precision70 | 14 | |
| Software Engineering | SWE-Bench Pro 1.0 (test) | Resolved Rate51.6 | 14 |