| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Tool Learning | RestBench TMDB | Success Rate86.2 | 32 | |
| Task Planning | RestBench TMDB | Node F182.63 | 25 | |
| Sequential Tool Use | RestBench Spotify | Success Rate86.1 | 22 | |
| Tool Planning | RestBench Spotify | Pass Rate61.25 | 12 | |
| Tool Planning | RestBench TMDB | Pass Rate72.4 | 12 | |
| Tool Learning | RestBench Spotify | Success87.72 | 10 | |
| Task Planning | RestBench TMDB v1 (test) | n-F182.56 | 4 | |
| Tool selection and execution success | RestBench Spotify | Metric- | 0 | |
| Tool selection and execution success | RestBench TMDB | Metric- | 0 |