Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BFCL

Benchmarks

Task NameDataset NameSOTA ResultTrend
Function CallingBFCL V3
Overall Accuracy79.3
104
Function CallingBFCL Multi-Turn v3
Overall Accuracy68.38
41
Function CallingBFCL Individual Tools per Problem
Execution Accuracy95
30
Tool-useBFCL Multi-turn
Accuracy54.75
24
Tool-use InferenceBFCL v2
MAT Score5.31
22
Function CallingBFCL Multi-turn
Accuracy42.3
22
Function CallingBFCL Single-turn
Accuracy84.2
22
Function CallingBFCL Simple Python
Accuracy0.938
20
Tool-use agentic performanceBFCL V3
Avg@479.5
19
Tool-callingBFCL Extended Setting
Non-Live Score85.81
18
Tool-callingBFCL Standard Setting
Non-Live Accuracy86.46
18
Tool-Use Agent EvaluationBFCL Multiturn (OOD) v3 (test)
Base Rate48
18
Function CallingBFCL
Energy (Wh)4.2
18
Throughput EfficiencyBFCL
Throughput5,093
18
Tool-callingBFCL
Non-Live Success Rate90.65
17
Function CallingBFCL Multi-Turn v4 (test)
Overall Acc46.75
17
Multi-Turn Tool CallingBFCL v4 (val)
Overall Accuracy85
15
Function CallingBFCL
Accuracy77.9
14
Tool-Augmented PlanningBFCL v3
Live Success Rate84.1
14
Tool-augmented reasoningBFCL Multi-Turn v3
Overall Score69.1
14
Function CallingBFCL (Held-In)
Accuracy89.4
14
Function CallingBFCL v4
Score68.8
13
Tool callingBFCL Multiple
Accuracy92.5
12
Function CallingBFCL Exec v3
Overall Accuracy94.6
12
Function CallingBFCL Live v3
Overall Accuracy77.9
12
Showing 25 of 68 rows