| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Software Engineering Task Resolution | SWE-bench Verified | Resolution Rate73.3 | 63 | |
| Agentic Coding | SWE-bench Verified | Percentage Resolved80.8 | 56 | |
| Automated Software Engineering | SWE-bench Verified | Resolved Rate1,770 | 39 | |
| Software Engineering | SWE-bench Lite | Speedup4.66 | 36 | |
| Issue Resolution | SWE-bench Verified (test) | Pass Rate77.2 | 36 | |
| Software Engineering | SWE-bench Verified | Accuracy62.6 | 33 | |
| Software Engineering | SWE-bench verified (All) | Success Rate93.8 | 32 | |
| Software Engineering | SWE-bench Verified | Resolution Rate83.8 | 32 | |
| Software Engineering | SWE-bench Verified | Success Rate71.8 | 31 | |
| Software Engineering Agent Task | SWE-Bench Pro | Pass@3100 | 28 | |
| Software Engineering Issue Resolution | SWE-bench Verified | Resolution Rate67.5 | 26 | |
| Function-level Code Localization | SWE-bench Live Lite | Acc@174.8 | 25 | |
| File-level Code Localization | SWE-bench Live Lite | Acc@182.1 | 25 | |
| Function-level Code Localization | SWE-bench Verified (Lite) | Acc@183.4 | 25 | |
| File-level Code Localization | SWE-bench Verified Lite | Accuracy@191.9 | 25 | |
| Code Localization | SWE-bench Verified (test) | File Precision86.38 | 24 | |
| Software Engineering | SWE-Bench Verified | Pass Rate72 | 20 | |
| Software Engineering | SWE-Bench Multilingual 1.0 (test) | Resolution Rate75.2 | 20 | |
| Software Engineering / Issue Resolving | SWE-bench Verified | Pass@166 | 19 | |
| Automated Software Engineering | SWE-bench Lite | Resolve Rate33 | 19 | |
| Software Engineering Task Completion | SWE-bench | S@50 Success Rate90.2 | 18 | |
| Software Engineering Problem Solving | SWE-Bench C# | Resolve Rate47.3 | 18 | |
| Software engineering | SWE-Bench Verified | Pass@184 | 18 | |
| Agentic Uncertainty Elicitation | SWE-bench Pro (test) | AUROC0.68 | 18 | |
| Agentic Coding | SWE-Bench Verified | Pass@179.6 | 17 |