| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| SWE-bench Verified | Percentage Resolved80.8 | 56 | 22d ago | ||
| Terminal Bench 2.0 | Pass@159.1 | 18 | 6d ago | ||
| TerminalBench 2 | Pass Rate81.8 | 17 | 2mo ago | ||
| SWE-Bench Verified | Pass@179.6 | 17 | 6d ago | ||
| Project dev (test) | Tau^20.881 | 13 | 3mo ago | ||
| SWE-Bench Multilingual | MiMo-V2-Flash | Accuracy71.7 | 13 | 6d ago | |
| SWE-bench Pro | Pass@152.6 | 9 | 6d ago | ||
| TerminalBench | LongCat-Flash-Lite | Accuracy0.3375 | 7 | 2mo ago | |
| SWE-Bench | LongCat-Flash-Lite | Accuracy54.4 | 7 | 2mo ago | |
| LiveCodeBench | ALIVE-Self | Pass@156 | 6 | 3mo ago | |
| SWE-bench Multilingual | pass@173.3 | 5 | 6d ago | ||
| U-Artifacts | Pass@157.8 | 5 | 2mo ago | ||
| AInstein-SWE-Bench | Pass@142.8 | 5 | 2mo ago | ||
| Multi-SWE-Bench | Pass@144.3 | 5 | 2mo ago | ||
| Coding unseen tasks (test) | SGE | Pass@129.2 | 3 | 3mo ago | |
| PRDBench | LongCat-Flash-Lite | Accuracy39.63 | 2 | 3mo ago |