| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Error Attribution | Who&When | Pair µF18.1 | 30 | |
| Failure Attribution | Who&When | Agent Accuracy60.79 | 22 | |
| Trajectory Attribution | Who&When n=58 (Hand-Crafted) | Agent-level Accuracy73 | 15 | |
| Trajectory Attribution | Who&When Algorithm-Generated n=126 | Agent-level Accuracy68 | 15 | |
| Failure Attribution | Who&When Total | Step-level Accuracy36.22 | 13 | |
| Failure Attribution | Who&When Hand-Crafted | Step-level Accuracy41.38 | 13 | |
| Failure Attribution | Who&When Algorithm-Generated | Step-level Accuracy42.86 | 13 | |
| Online auditing | Who&When | Step Accuracy57.69 | 8 | |
| Error Forecasting | Who&When | Eta (%)100 | 6 | |
| Failure attribution | Who & When Boundary | Agent Attribution Accuracy38.71 | 6 | |
| Failure attribution | Who & When Remove ID | Agent Attribution Accuracy26.47 | 6 | |
| Failure attribution | Who & When Baseline | Agent Attribution Accuracy54.33 | 6 |