Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Function Calling benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Function Calling
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
BFCL V3
D-CORE-14B
Overall Accuracy
79.3
104
1d ago
BFCL Multi-Turn v3
MIND-Skill
Overall Accuracy
78.7
69
20d ago
BFCL
MPMA DPMA
False Negative Rate
0
56
1mo ago
Tool-Alpaca
GPT-4o
F1 Score
77.66
40
1mo ago
BFCL Individual Tools per Problem
Gold Documentation (Oracle)
Execution Accuracy
95
30
3mo ago
ToolBench Average
ParaTool
Pass Rate
75.95
30
5d ago
BFCL
OpenFunctions-v2
Success Rate (Simple)
83.27
29
1mo ago
BFCL (Berkeley Function Calling Leaderboard)
GenEnv
Base Score
41.8
28
3mo ago
Berkeley Function Call Leaderboard (BFCL) Live (Out-of-Domain)
Qwen3-4B
AST Simple
0.876
26
3mo ago
Berkeley Function Call Leaderboard (BFCL) Non-Live Out-of-Domain
Hammer2.1-3B
AST Simple
81.4
26
3mo ago
BFCL v4
Claude-Sonnet-4.5
Score
68.8
25
13d ago
BFCL (Live)
GENESISFUNC-8B
Simple Accuracy
88.25
24
5d ago
BFCL Multi-turn
EVOTOOL
Accuracy
42.3
22
2mo ago
BFCL Single-turn
EvoPrompt
Accuracy
84.2
22
2mo ago
BFCL Memory
SGLang FP4
Task Accuracy
28.22
20
15d ago
Berkeley Function Calling Leaderboard (BFCL) Overall November 19, 2025
R2IF
Non-live Accuracy
69.44
20
1mo ago
ACEBench
R2IF
Atom Score
78
20
1mo ago
BFCL V4
Seed 2.0
Multi-Turn Success Rate
62.3
20
12d ago
BFCL Simple Python
Full-FT
Accuracy
0.938
20
3mo ago
Berkeley Function Call Leaderboard (BFCL) online inference setting
Qwen3-8B
Input Tokens
621.13
19
3mo ago
TB-MM
DTDR-L
FSA
64.1
18
3mo ago
TB-HF
DTDR-L
FSA
60.5
18
3mo ago
TB-DL
DTDR-L
FSA
89
18
3mo ago
TinyAgent
DTDR-L
FSA
0.807
18
3mo ago
BFCL
Benign
Energy (Wh)
4.2
18
3mo ago
Showing 25 of 101 rows
25 / page
50 / page
100 / page
1
2
3
4
5
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs