Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

API-Bank

Benchmarks

Task NameDataset NameSOTA ResultTrend
Tool CallingAPI-Bank L-1
F1 Name Match94.99
46
Stepwise tool-useAPI-Bank (test)
Success Rate74
28
Tool CallingAPI-Bank L-2
Name Match F190.42
25
Tool-use InferenceAPI-Bank
Match Rate (#MAT)5.8
22
API UseAPI-Bank
Success Rate77.19
18
Tool UseAPI-Bank Level 2
Accuracy66.22
18
Tool UseAPI-Bank (test)
Accuracy92.6
16
Tool-augmented reasoningAPI-Bank
Success Rate79.1
12
Tool CallingAPI-Bank L-2 v1 (test)
F1 Name Match88
12
Tool CallingAPI-Bank L-1 v1 (test)
F1 Score90.78
12
Function CallingAPI-Bank Level-2
ROUGE-L83.2
12
Function CallingAPI-Bank Level-1
ROUGE-L93.4
12
Tool UseAPI Bank
Accuracy90
10
Tool LearningAPI-Bank LV2
Correctness62.41
10
Single-agent tool useAPI-Bank reconstructed
Correctness79.27
9
Function CallingAPI-Bank
Level-1 Score79.17
8
Tool Retrieval and CallingAPI-Bank Call+Retrieve
Task Completion Rate26.9
8
Tool CallingAPI-Bank Call
Task Completion Rate34.7
8
Tool UseAPI-Bank (L1)
Score81.3
6
Tool Sequence RecommendationAPI-Bank Level-3 50 instances (LOO-CV)
Set F194.5
6
Tool UseAPI-Bank L2 cleaned (test)
F1 (API Matching)87.32
5
Showing 21 of 21 rows