| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MetaTool | PA-Tool | Similarity80.8 | 39 | 8d ago | |
| ToolAlpaca | MTA | Accuracy97.42 | 20 | 3mo ago | |
| API-Bench | MTA | Accuracy96.3 | 20 | 3mo ago | |
| SD Single-domain | MTA | Accuracy94.04 | 20 | 3mo ago | |
| CD Cross-domain | MTA | Accuracy97.33 | 20 | 3mo ago | |
| APIGen sampled (test) | TokMem (w/ adapt) | Tool Selection F1 (2 calls)99.4 | 15 | 2mo ago | |
| BFCL | OLIVIA | F1 Score36.6 | 14 | 8d ago | |
| TaskBench-MM | OLIVIA | F1 Score20.7 | 12 | 21d ago | |
| TaskBench | CLIN | F1 Score21.1 | 12 | 21d ago | |
| ToolBench | OLIVIA | F1 Score58.5 | 12 | 21d ago | |
| Chess Skill: beginner, intermediate, advanced | Accuracy100 | 10 | 3mo ago | ||
| Chess Specialists: opening, midgame, endgame, late-endgame | Accuracy64.4 | 10 | 3mo ago | ||
| EnterpriseBench | Tool Selection Accuracy24 | 8 | 2mo ago | ||
| EnterpriseArena | Tool Selection Accuracy45 | 8 | 2mo ago | ||
| MetaTool similar choices subtask (test) | OATS-S1 | Accuracy83.4 | 8 | 2mo ago | |
| MetaTool 199 tools, 1,287 queries (30% test) | OATS-S1 | R@183 | 7 | 2mo ago | |
| ToolBench 30% 2,413 tools, 180 queries (test) | Recall@139.2 | 7 | 2mo ago | ||
| Trace-based setting | Trace-based | Improvement6.8 | 4 | 3mo ago | |
| All Tasks | ITR | Tools Correct82 | 4 | 3mo ago | |
| Seal-Tools (test) | SARL | Top-1 Acc99.9 | 2 | 3mo ago | |
| GUI-360° (test) | SARL | Top-1 Accuracy61.9 | 2 | 3mo ago | |
| ToolBench | BoR | BoR Found Rate61.9 | 1 | 8d ago |