| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Error category prediction | TRAIL Planning and Reasoning categories (117 traces) | Micro F149.7 | 6 | |
| Multi-agent recommendation | trail-benchmark | Top-1 Accuracy100 | 4 | |
| Single-agent tool selection | TRAIL | Top-1 Accuracy98.15 | 4 |