Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Tool Calling benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Tool Calling
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
Confetti
Gemini-3.1 Flash-Live
AST Soft Accuracy
72.7
49
19d ago
API-Bank L-1
HiTEC-KTO
F1 Name Match
94.99
46
20d ago
When2Call
GPT-1.5 Realtime
F1 Score
76.8
42
19d ago
Supply Chain Tool Calling 1.0 (test)
GPT-5-mini
Accuracy
86.73
38
3mo ago
Tool-Alpaca
Llama3-70B
Tool Name Accuracy
91.17
31
3mo ago
Seal-Tools Single-Tool
Llama3-70B
Name Match Score
98.14
30
3mo ago
SupChain-Bench
SupChain-ReAct
Accuracy
75.51
27
3mo ago
API-Bank L-2
HiTEC-KTO
Name Match F1
90.42
25
3mo ago
ACEBench Extended Setting
GT_Funs
Overall Score
65.17
18
2mo ago
BFCL Extended Setting
GT_Funs
Non-Live Score
85.81
18
2mo ago
ACEBench Standard Setting
ToolGT (Prompting)
Overall Score
68.92
18
2mo ago
BFCL Standard Setting
All_Funs
Non-Live Accuracy
86.46
18
2mo ago
BFCL (Berkeley Function Calling Leaderboard)
Llama-3.1-8B-Instruct
Single-Turn Non-Live Success Rate
85.2
17
23d ago
DIABENCH Dynamic Evaluation 1.0
Llama-3.3-Nemotron-DiaFORGE-49B
ACC
89
17
1mo ago
DIABENCH Static Evaluation 1.0
Llama-3.3-Nemotron-DiaFORGE-49B
Accuracy
82
17
1mo ago
BFCL
Gemini-3-Pro-Preview
Non-Live Success Rate
90.65
17
2mo ago
BFCL Live
ParaTool (ours)
Multiple Success Rate
79.01
16
5d ago
BFCL Non-live
ParaTool (ours)
Multiple Success Rate
96.67
16
5d ago
Stable Toolbench I3-Inst
ParaTool
Pass Rate
68.85
16
5d ago
Stable Toolbench I2-Cat
ParaTool
Pass Rate
78.3
16
5d ago
Stable Toolbench I2-Inst
ParaTool
Pass Rate
77.42
16
5d ago
Stable Toolbench I1-Tool
ParaTool
Pass Rate
75.95
16
5d ago
Stable Toolbench I1-Cat
ParaTool
Pass Rate
76.07
16
5d ago
Stable Toolbench I1-Inst
ParaTool
Pass Rate
0.7909
16
5d ago
F1 Average
Llama3-70B
Tool Call Name F1
91.37
16
3mo ago
Showing 25 of 68 rows
25 / page
50 / page
100 / page
1
2
3
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs