| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Software Engineering | SWE-rebench January 2026 (test) | Resolved Rate52.9 | 8 | |
| Software Issue Resolution | SWE-rebench 60-task Python subset v2 | Pass@136.11 | 7 | |
| Software Engineering Tasks | SWE-rebench subset V2 (test) | Resolved Rate43.7 | 4 | |
| Software Issue Resolution | SWE-rebench full Python v2 | Pass@122.36 | 1 |