Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tool Calling on BFCL Multiple
Loading...
92.5
Accuracy
Full-FT
43.62
56.31
69
81.69
Mar 3, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Full-FT
Backbone=Qwen3-14B-Bas...
2026.03
92.5
SUN
Backbone=Qwen3-14B-Bas...
2026.03
92.5
Full-FT
Backbone=Qwen3-8B-Base...
2026.03
92
SUN
Backbone=Qwen3-8B-Base...
2026.03
91.5
SUN
Backbone=LLaMA3.1-8B,...
2026.03
90.5
Full-FT
Backbone=Qwen3-1.7B-Ba...
2026.03
90
SUN
Backbone=Qwen3-1.7B-Ba...
2026.03
89
Full-FT
Backbone=LLaMA3.1-8B,...
2026.03
88
Qwen3-8B-Base
Decode Execution=N/A
2026.03
80
Qwen3-14B-Base
Decode Execution=N/A
2026.03
77
Qwen3-1.7B-Base
Decode Execution=N/A
2026.03
60
LLaMA3.1-8B
Decode Execution=N/A
2026.03
45.5
Feedback
Search any
task
Search any
task