Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tool Use on TAU-2 & BFCL-V4 Multi-turn
Loading...
52
Average Score
Qwen3-235B-A22B-Instruct
22.672
30.286
37.9
45.514
Apr 23, 2026
Average Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Average Score
Qwen3-235B-A22B-Instruct
Reasoning Method=non-t...
2026.04
52
AgenticQwen-30B-A3B
Reasoning Method=non-t...
2026.04
50.2
AgenticQwen-8B
Reasoning Method=non-t...
2026.04
47.4
Qwen3-30B-A3B-Instruct
Reasoning Method=non-t...
2026.04
36.2
Qwen3-32B
Reasoning Method=non-t...
2026.04
36
Qwen3-8B
Reasoning Method=non-t...
2026.04
23.8
Feedback
Search any
task
Search any
task