Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Agentic Dialogue on τ-Bench (test)
Loading...
60.4
Retail Accuracy
GPT-4o-2024-11-20
3.928
18.589
33.25
47.911
Aug 18, 2025
Retail Accuracy
Airline Accuracy
Average Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Retail Accuracy
Airline Accuracy
Average Accuracy
GPT-4o-2024-11-20
2025.08
60.4
42
51.2
Llama3.1-70B-Inst
2025.08
50.4
26
38.2
ToolACE-MT
2025.08
25.2
16
20.6
ToolACE-MT
Ablation=Without Offli...
2025.08
22.6
6
14.3
Multi-Agent Simulation
2025.08
21.7
10
15.9
ToolACE-MT
Ablation=Without Itera...
2025.08
9.5
6
7.8
Llama3.1-8B-Inst
2025.08
6.1
26
16.1
Feedback
Search any
task
Search any
task