| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Automated Software Engineering | SWE-bench Verified | Resolved Rate1,770 | 39 | |
| Issue Resolution | SWE-bench Verified (test) | Pass Rate77.2 | 36 | |
| Function-level Code Localization | SWE-bench Live Lite | Acc@174.8 | 25 | |
| File-level Code Localization | SWE-bench Live Lite | Acc@182.1 | 25 | |
| Function-level Code Localization | SWE-bench Verified (Lite) | Acc@183.4 | 25 | |
| File-level Code Localization | SWE-bench Verified Lite | Accuracy@191.9 | 25 | |
| Code Localization | SWE-bench Verified (test) | File Precision86.38 | 24 | |
| Automated Software Engineering | SWE-bench Lite | Resolve Rate33 | 19 | |
| Agentic Coding | SWE-bench Verified | Percentage Resolved77.2 | 19 | |
| Software engineering | SWE-Bench Verified | Pass@184 | 18 | |
| Agentic Uncertainty Elicitation | SWE-bench Pro (test) | AUROC0.68 | 18 | |
| Software Engineering Task Resolution | SWE-bench Verified | Resolution Rate0.704 | 17 | |
| Function-level Localization | SWE-Bench Lite latest (test) | NDCG@564.34 | 16 | |
| Module-level Localization | SWE-Bench-Lite latest (test) | NDCG@577.73 | 16 | |
| File-level Localization | SWE-Bench-Lite latest (test) | NDCG@177.74 | 16 | |
| Function-level Code Localization | SWE-bench lite | Acc@573.36 | 16 | |
| Module-level Code Localization | SWE-bench lite | Acc@586.5 | 16 | |
| File-level Code Localization | SWE-bench lite | Acc@177.74 | 16 | |
| Software Engineering Issue Resolution | SWE-Bench Lite | Resolution Rate73.5 | 16 | |
| Code Agent | SWE-Bench Verified | Score0.809 | 13 | |
| Software Engineering Task Resolution | SWE-BENCH LIVE | Resolution Rate24.7 | 11 | |
| Software Engineering | SWE-bench Verified | Resolution Rate0.402 | 9 | |
| Software Engineering | SWE-Bench Pro (public) | Resolve Rate (Pass@1)59 | 9 | |
| Issue Resolving | SWE-bench lite | Rounds5 | 9 | |
| Software Engineering Issue Solving | SWE-Bench Verified | Accuracy46 | 8 |