Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Tool Calling benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Tool Calling
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
API-Bank L-1
HiTEC-KTO
F1 Name Match
94.99
46
1mo ago
Supply Chain Tool Calling 1.0 (test)
GPT-5-mini
Accuracy
86.73
38
1mo ago
Tool-Alpaca
Llama3-70B
Tool Name Accuracy
91.17
31
1mo ago
Seal-Tools Single-Tool
Llama3-70B
Name Match Score
98.14
30
1mo ago
SupChain-Bench
SupChain-ReAct
Accuracy
75.51
27
1mo ago
API-Bank L-2
HiTEC-KTO
Name Match F1
90.42
25
1mo ago
ACEBench Extended Setting
GT_Funs
Overall Score
65.17
18
1mo ago
BFCL Extended Setting
GT_Funs
Non-Live Score
85.81
18
1mo ago
ACEBench Standard Setting
ToolGT (Prompting)
Overall Score
68.92
18
1mo ago
BFCL Standard Setting
All_Funs
Non-Live Accuracy
86.46
18
1mo ago
DIABENCH Dynamic Evaluation 1.0
Llama-3.3-Nemotron-DiaFORGE-49B
ACC
89
17
5d ago
DIABENCH Static Evaluation 1.0
Llama-3.3-Nemotron-DiaFORGE-49B
Accuracy
82
17
5d ago
BFCL
Gemini-3-Pro-Preview
Non-Live Success Rate
90.65
17
1mo ago
F1 Average
Llama3-70B
Tool Call Name F1
91.37
16
1mo ago
Nexus Raven
Llama3-70B
Score (Name)
94.84
16
11d ago
BFCL Multiple
Full-FT
Accuracy
92.5
12
1mo ago
Nexus Raven v1 (test)
HiTEC-KTO
F1 Name
94.84
12
1mo ago
Seal-Tools Single-Tool v1 (test)
HiTEC-KTO
F1 Name
98.14
12
1mo ago
Tool-Alpaca v1 (test)
HiTEC-KTO
F1 Name
87.63
12
1mo ago
API-Bank L-2 v1 (test)
HiTEC-KTO
F1 Name Match
88
12
1mo ago
API-Bank L-1 v1 (test)
HiTEC-KTO
F1 Score
90.78
12
1mo ago
BFCL out-of-domain
TInR-U
Exact Match (EM)
26
10
4d ago
In-domain (unseen)
TInR-U
Exact Match (EM)
57.24
10
4d ago
In-domain seen
TInR-U
EM
74.05
10
4d ago
Average across 5 benchmarks
HiTEC-ICL
F1 (Name)
88.47
9
1mo ago
Showing 25 of 42 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs