| SWE-bench Lite | Draft-OPD | Speedup4.66 | | 36 | 5d ago |
| SWE-bench Verified | JoyAI-LLM Flash | Accuracy62.6 | | 33 | 1mo ago |
| SWE-bench verified (All) | ACE | Success Rate93.8 | | 32 | 1mo ago |
| SWE-bench Verified | MemCoder | Resolution Rate83.8 | | 32 | 2d ago |
| SWE-bench Verified | OpenHands | Success Rate71.8 | | 31 | 1mo ago |
| SWE Lite | Draft-OPD | Throughput (tok/s)10,538 | | 30 | 5d ago |
| Commit0-Lite | STORM-Combined | Score88.2 | | 24 | 13d ago |
| SWE-Bench Verified | HyperAgent + Librarian | Pass Rate72 | | 20 | 6d ago |
| SWE-Bench Multilingual 1.0 (test) | | Resolution Rate75.2 | | 20 | 3mo ago |
| SWE-Bench Verified | Mini-SWE | Pass@184 | | 18 | 1mo ago |
| SWE Verified | | Resolution Rate77.2 | | 17 | 3mo ago |
| SWE-bench | | Resolve Rate82.4 | | 16 | 27d ago |
| PaperBench Code (dev) | STORM-Combined | Score78.2 | | 15 | 13d ago |
| SWE-bench Multilingual | SE-agent-Reflect | Pass@121 | | 14 | 21d ago |
| SWE-Bench Pro 1.0 (test) | | Resolved Rate51.6 | | 14 | 3mo ago |
| SWE-Bench Pro (public) | CCA | Resolve Rate (Pass@1)59 | | 13 | 15d ago |
| SWE-Bench-Verified (50 cases) | | Accuracy72 | | 12 | 1mo ago |
| SWE-Bench Verified | | Resolution Rate (%)86.2 | | 10 | 15d ago |
| PaperBench | Claude Sonnet 4.5 | Score66.8 | | 9 | 2mo ago |
| SWE-bench Lite (300 instances) | | Misalignment Rate0 | | 8 | 6d ago |
| SWE-rebench January 2026 (test) | | Resolved Rate52.9 | | 8 | 3mo ago |
| SWE-bench Lite | TOOLSELF | Accuracy16.1 | | 8 | 3mo ago |
| SWE-Bench Verified | | SWE-Agent Score78.2 | | 7 | 3mo ago |
| SWE-Bench (val) | | Acc28.8 | | 7 | 3mo ago |
| SWE-Bench Lite 300-issue subset | Agentless-1.5 | Accuracy32 | | 6 | 1mo ago |