| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| SWE-bench Verified | Percentage Resolved77.2 | 33 | 1mo ago | ||
| TerminalBench 2 | Pass Rate81.8 | 17 | 18d ago | ||
| Project dev (test) | Tau^20.881 | 13 | 1mo ago | ||
| SWE-Bench Multilingual | MiMo-V2-Flash | Accuracy71.7 | 8 | 1mo ago | |
| TerminalBench | LongCat-Flash-Lite | Accuracy0.3375 | 7 | 18d ago | |
| SWE-Bench | LongCat-Flash-Lite | Accuracy54.4 | 7 | 18d ago | |
| LiveCodeBench | ALIVE-Self | Pass@156 | 6 | 1mo ago | |
| U-Artifacts | Pass@157.8 | 5 | 25d ago | ||
| Terminal Bench 2.0 | Pass@154.2 | 5 | 25d ago | ||
| AInstein-SWE-Bench | Pass@142.8 | 5 | 25d ago | ||
| Multi-SWE-Bench | Pass@144.3 | 5 | 25d ago | ||
| SWE-Bench Verified | Pass@177.2 | 5 | 25d ago | ||
| Coding unseen tasks (test) | SGE | Pass@129.2 | 3 | 1mo ago | |
| PRDBench | LongCat-Flash-Lite | Accuracy39.63 | 2 | 1mo ago |