| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| StableToolBench Average | GPT-4 (DFSDT) | SoPR70.3 | 13 | 4d ago | |
| StableToolBench I3-Inst. | GPT-4 (DFSDT) | SoPR76 | 13 | 4d ago | |
| StableToolBench I2-Cat. | DTA-Llama | SoPR71.9 | 13 | 4d ago | |
| StableToolBench I2-Inst. | GPT-4 (Parallel) | SoPR73.4 | 13 | 4d ago | |
| StableToolBench I1-Cat. | GPT-4 (Parallel) | SoPR70.9 | 13 | 4d ago | |
| StableToolBench I1-Tool | GPT-3.5 (DFSDT) | SoPR73.9 | 13 | 4d ago | |
| StableToolBench I1-Inst. | GPT-4 (DFSDT) | SoPR69 | 13 | 4d ago | |
| API-Bank LV2 | ToolCoder | Correctness62.41 | 10 | 4d ago | |
| RestBench Spotify | ToolCoder | Success87.72 | 10 | 4d ago | |
| RestBench TMDB | ToolCoder | Success Rate85 | 10 | 4d ago |