Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Agentic Dialogue on τ-Bench (test)

60.4Retail Accuracy

GPT-4o-2024-11-20

3.92818.58933.2547.911Aug 18, 2025
Updated 4d ago

Evaluation Results

MethodLinks
60.44251.2
50.42638.2
2025.08
25.21620.6
2025.08
22.6614.3
21.71015.9
2025.08
9.567.8
6.12616.1