| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Model Learning from Noisy Data | SWE (Shallow Water Equations) system | Full-field Avg Relative Error4.27 | 18 | |
| Software Engineering | SWE Verified | Resolution Rate77.2 | 17 | |
| Code | SWE Verified Agentless | pass@157.6 | 8 | |
| Software Engineering Automation | SWE Multilingual | Resolved70.2 | 5 | |
| Watermark Detection | SWE (test) | Delta Q (Δ̂q)0.71 | 4 | |
| Agent Trajectory Performance | SWE (test) | Pass@1 Accuracy (%)12.7 | 4 | |
| Historical normalization | swe historical normalization (test) | Accuracy0.579 | 4 | |
| Solution Prediction | SWE | Relative L2 Error (Data)2.15 | 3 | |
| Learning PDE Dynamics | SWE | Relative L2 Error0.005 | 2 | |
| State Rollout | SWE | Metric- | 0 |