| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Function Selection | TaskBench HuggingFace | Function Selection Accuracy77.1 | 45 | |
| Function Selection | TaskBench Multimedia | Function Selection Acc82.3 | 36 | |
| Function Selection | TaskBench DailyLife | Function Selection Accuracy96.8 | 36 | |
| Task Planning | TaskBench Daily Life | Node-F197.36 | 25 | |
| Task Planning | TaskBench Multimedia | Node F188.54 | 25 | |
| Tool Retrieval and Function Selection | Taskbench-HF | MRR0.75 | 18 | |
| Task Planning | TaskBench Multimedia v1 (test) | n-F188.54 | 14 | |
| Tool selection | TaskBench-MM | F1 Score20.7 | 12 | |
| Tool selection | TaskBench | F1 Score21.1 | 12 | |
| Tool Retrieval and Function Selection | Taskbench DL | Function Selection Accuracy58.9 | 9 | |
| Tool Retrieval and Function Selection | Taskbench-MM | Function Selection Accuracy27 | 9 | |
| Multi-agent Planning and Composition | TaskBench Multimedia APIs (test) | Precision91.12 | 1 | |
| Multi-agent Planning and Composition | TaskBench Hugging Face APIs (test) | Precision74.97 | 1 | |
| Multi-agent Planning and Composition | TaskBench Dailylife APIs (test) | Precision95.01 | 1 |