Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BFCL

Benchmarks

Task NameDataset NameSOTA ResultTrend
Function CallingBFCL V3
Overall Accuracy79.3
104
Function CallingBFCL Multi-Turn v3
Overall Accuracy78.7
69
Function CallingBFCL
False Negative Rate0
56
Tool UseBFCL
Accuracy94
45
Tool-use Factuality EvaluationBFCL Task
Factuality Score76
42
Function CallingBFCL Individual Tools per Problem
Execution Accuracy95
30
Function CallingBFCL
Success Rate (Simple)83.27
29
Function CallingBFCL v4
Score68.8
25
Function CallingBFCL (Live)
Simple Accuracy88.25
24
Multi-Turn Function CallingBFCL Multi-Turn Base v3
Greedy Success Rate69
24
Tool-useBFCL Multi-turn
Accuracy54.75
24
Tool-use InferenceBFCL v2
MAT Score5.31
22
Function CallingBFCL Multi-turn
Accuracy42.3
22
Function CallingBFCL Single-turn
Accuracy84.2
22
Function Calling / Tool UseBFCL parallel parallel-multiple Actions
Accuracy82.2
20
Function CallingBFCL Memory
Task Accuracy28.22
20
Function CallingBFCL V4
Multi-Turn Success Rate62.3
20
Tool UsageBFCL Multi-Parallel v2
Accuracy87.5
20
Tool UsageBFCL Parallel v2
Accuracy87.5
20
Tool UsageBFCL Multi-Parallel v1
Accuracy90.5
20
Tool UsageBFCL Parallel v1
Accuracy95.5
20
Function CallingBFCL Simple Python
Accuracy0.938
20
Tool-use agentic performanceBFCL V3
Avg@479.5
19
Execution AccuracyBFCL v2
Non-Live AST Accuracy88.24
18
Tool-callingBFCL Extended Setting
Non-Live Score85.81
18
Showing 25 of 130 rows