Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

tau^2 Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Agentic Tool-usetau^2 Bench official evaluation setting GPT-4.1 simulator
Retail Score0.775
9
Agentic performanceTAU-2 Bench
Airline Score47.5
7
Showing 2 of 2 rows