| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Tool Calling | API-Bank L-1 | F1 Name Match94.99 | 46 | |
| Stepwise tool-use | API-Bank (test) | Success Rate74 | 28 | |
| Tool Calling | API-Bank L-2 | Name Match F190.42 | 25 | |
| Tool-use Inference | API-Bank | Match Rate (#MAT)5.8 | 22 | |
| API Use | API-Bank | Success Rate77.19 | 18 | |
| Tool Use | API-Bank Level 2 | Accuracy66.22 | 18 | |
| Tool Use | API-Bank (test) | Accuracy92.6 | 16 | |
| Tool-augmented reasoning | API-Bank | Success Rate79.1 | 12 | |
| Tool Calling | API-Bank L-2 v1 (test) | F1 Name Match88 | 12 | |
| Tool Calling | API-Bank L-1 v1 (test) | F1 Score90.78 | 12 | |
| Function Calling | API-Bank Level-2 | ROUGE-L83.2 | 12 | |
| Function Calling | API-Bank Level-1 | ROUGE-L93.4 | 12 | |
| Tool Use | API Bank | Accuracy90 | 10 | |
| Tool Learning | API-Bank LV2 | Correctness62.41 | 10 | |
| Single-agent tool use | API-Bank reconstructed | Correctness79.27 | 9 | |
| Function Calling | API-Bank | Level-1 Score79.17 | 8 | |
| Tool Retrieval and Calling | API-Bank Call+Retrieve | Task Completion Rate26.9 | 8 | |
| Tool Calling | API-Bank Call | Task Completion Rate34.7 | 8 | |
| Tool Use | API-Bank (L1) | Score81.3 | 6 | |
| Tool Sequence Recommendation | API-Bank Level-3 50 instances (LOO-CV) | Set F194.5 | 6 | |
| Tool Use | API-Bank L2 cleaned (test) | F1 (API Matching)87.32 | 5 |