Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

BFCL

Benchmarks

Task NameDataset NameSOTA ResultTrend
Function CallingBFCL V3
Overall Accuracy79.3
88
Function CallingBFCL Multi-Turn v3
Overall Accuracy68.38
41
Function CallingBFCL Individual Tools per Problem
Execution Accuracy95
30
Tool-useBFCL Multi-turn
Accuracy54.75
24
Function CallingBFCL
Energy (Wh)4.2
18
Throughput EfficiencyBFCL
Throughput5,093
18
Function CallingBFCL Multi-Turn v4 (test)
Overall Acc46.75
17
Multi-Turn Tool CallingBFCL v4 (val)
Overall Accuracy85
15
Tool-Augmented PlanningBFCL v3
Live Success Rate84.1
14
Tool-augmented reasoningBFCL Multi-Turn v3
Overall Score69.1
14
Function CallingBFCL (Held-In)
Accuracy89.4
14
Function CallingBFCL v4
Score68.8
13
Documentation GenerationBFCL Opaque
Semantic Similarity78
12
Tool UseBFCL
Accuracy66.3
12
Function CallingBFCL Executable (test)
Success Rate (Simple, Python)100
12
Function CallingBFCL V3 (test)
Overall Accuracy63.34
11
Multi-Turn Function CallingBFCL Multi-Turn Base v3
Greedy Success Rate44.2
11
Agent & AlignmentBFCL v3
Score75.61
10
Tool-useBFCL Single-Turn
OA84.11
10
Function CallingBFCL v3 2025-08-26 (test)
Multi-Turn Overall Accuracy50
9
Function callingBFCL Multi-Turn Base v3 (val)
Avg@843
9
Tool UseBFCL (test)
Accuracy90.2
9
Tool UseBFCL Agentic v4 (out-of-distribution)
Web-base Score39
8
Function CallingBFCL Simple Python
Accuracy0.923
8
Tool CallingBFCL V3
pass@170.4
7
Showing 25 of 42 rows