| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Tool Calling | API-Bank L-1 | F1 Name Match94.99 | 46 | |
| Tool Calling | API-Bank L-2 | Name Match F190.42 | 25 | |
| Tool-augmented reasoning | API-Bank | Success Rate79.1 | 12 | |
| Tool Calling | API-Bank L-2 v1 (test) | F1 Name Match88 | 12 | |
| Tool Calling | API-Bank L-1 v1 (test) | F1 Score90.78 | 12 | |
| Function Calling | API-Bank Level-2 | ROUGE-L83.2 | 12 | |
| Function Calling | API-Bank Level-1 | ROUGE-L93.4 | 12 | |
| Tool Learning | API-Bank LV2 | Correctness62.41 | 10 | |
| Tool Use | API-Bank (test) | Accuracy92.6 | 10 | |
| Single-agent tool use | API-Bank reconstructed | Correctness79.27 | 9 |