| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| SWE-rebench 60-task Python subset v2 | Claude Opus-4.5 | Pass@136.11 | 7 | 19d ago | |
| SWE-bench Lite (test) | Resolved Issues Count185 | 6 | 23d ago | ||
| SWE-bench Lite | CADMAS-CTX | Overall Resolution Success31.4 | 4 | 1mo ago | |
| SWE-rebench full Python v2 | Orchard-SWE | Pass@122.36 | 1 | 19d ago |