| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| ToolAlpaca | MTA | Accuracy97.42 | 20 | 1mo ago | |
| API-Bench | MTA | Accuracy96.3 | 20 | 1mo ago | |
| SD Single-domain | MTA | Accuracy94.04 | 20 | 1mo ago | |
| CD Cross-domain | MTA | Accuracy97.33 | 20 | 1mo ago | |
| APIGen sampled (test) | TokMem (w/ adapt) | Tool Selection F1 (2 calls)99.4 | 15 | 1mo ago | |
| Chess Skill: beginner, intermediate, advanced | Accuracy100 | 10 | 1mo ago | ||
| Chess Specialists: opening, midgame, endgame, late-endgame | Accuracy64.4 | 10 | 1mo ago | ||
| EnterpriseBench | Tool Selection Accuracy24 | 8 | 25d ago | ||
| EnterpriseArena | Tool Selection Accuracy45 | 8 | 25d ago | ||
| MetaTool similar choices subtask (test) | OATS-S1 | Accuracy83.4 | 8 | 1mo ago | |
| MetaTool 199 tools, 1,287 queries (30% test) | OATS-S1 | R@183 | 7 | 1mo ago | |
| ToolBench 30% 2,413 tools, 180 queries (test) | Recall@139.2 | 7 | 1mo ago | ||
| Trace-based setting | Trace-based | Improvement6.8 | 4 | 1mo ago | |
| All Tasks | ITR | Tools Correct82 | 4 | 1mo ago | |
| Seal-Tools (test) | SARL | Top-1 Acc99.9 | 2 | 1mo ago | |
| GUI-360° (test) | SARL | Top-1 Accuracy61.9 | 2 | 1mo ago |