Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

tau^2 Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Agentic Tool-usetau^2 Bench official evaluation setting GPT-4.1 simulator
Retail Score0.775
9
Agentic Capabilitytau^2-Bench Telecom
Pass@189
7
Agentic performanceTAU-2 Bench
Airline Score47.5
7
Tool Usetau^2-Bench
Pass@185.4
5
Showing 4 of 4 rows