| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| SWE dev full set (test) | ChatDev (Joint FT) | Total Overstepping Rate (<INFO>)8.4 | 4 | 13d ago | |
| SWE Dev hard (test) | ChatDev (Joint FT) | Overstepping Rate (<INFO>)6.8 | 4 | 13d ago | |
| SWE easy subset dev (test) | ChatDev (Joint FT) | Overstepping Rate (<INFO>)10 | 4 | 13d ago |