Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Tau-bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Agent Performancetau-bench
Retail Accuracy78.3
55
Multi-turn tool-use interactiontau-bench
Retail Success Rate86.1
35
Agentic Tool-useτ2-Bench (Tau-bench) Retail and Telecom
Overall Success Rate85.79
17
Tool-useTau-Bench
TAU-AIR Score67.5
14
Agentic Tool-UseTau-Bench
Retail Score71.3
13
Tool-use performanceTau-bench Retail (test)
Pass Rate66
12
Multi-turn agent decision makingtau-Bench (test)
Success Rate55.8
7
Tool Usetau-Bench
Pass@185.4
6
Function-callingTau-bench retail
Success Rate46
5
Function-callingTau-bench airline
Success Rate50
5
Agentic PerformanceTAU2-Bench
Success Rate85.4
5
Tooltau2-Bench
Accuracy15
4
Showing 12 of 12 rows