Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Agentic Performance on TAU-2 Bench
Loading...
47.5
Airline Score
Qwen3-235B
12.14
21.32
30.5
39.68
Jan 30, 2026
Airline Score
Telecom Score
Retail Score
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Airline Score
Telecom Score
Retail Score
Average Score
Qwen3-235B
full_name=Qwen3-235B-A...
2026.01
47.5
37.7
68
49.7
SYNTHAGENT-14B
setting=non-thinking,...
2026.01
40
44.7
58.6
46.3
SYNTHAGENT-8B
setting=non-thinking,...
2026.01
34.5
38.2
57.2
42.9
Qwen3-32B
setting=non-thinking,...
2026.01
22.5
27.6
44.7
36
Qwen3-14B
setting=non-thinking,...
2026.01
22
25.4
39.5
30.6
ToolStar-14B
setting=non-thinking,...
2026.01
18
30.7
40.4
35.7
ToolStar-8B
setting=non-thinking,...
2026.01
13.5
25.9
39.5
31.7
Feedback
Search any
task
Search any
task