Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Agentic performance on BFCL Multi-turn V4
Loading...
58.5
Base Score
Qwen3-235B
39.26
44.255
49.25
54.245
Jan 30, 2026
Base Score
Miss Function Rate
Miss Parameter Rate
Long Context Score
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Base Score
Miss Function Rate
Miss Parameter Rate
Long Context Score
Average Score
Qwen3-235B
full_name=Qwen3-235B-A...
2026.01
58.5
47.5
35
54
49.7
SYNTHAGENT-14B
setting=non-thinking,...
2026.01
57
46.5
31
46
46.3
ToolStar-14B
setting=non-thinking,...
2026.01
56.5
35.5
29.5
39.5
35.7
SYNTHAGENT-8B
setting=non-thinking,...
2026.01
54.5
45.5
33
37.5
42.9
ToolStar-8B
setting=non-thinking,...
2026.01
52
38
22.5
30.5
31.7
Qwen3-32B
setting=non-thinking,...
2026.01
50.5
43
30.5
33
36
Qwen3-14B
setting=non-thinking,...
2026.01
40
34.5
26.5
26.5
30.6
Feedback
Search any
task
Search any
task