Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

τ2-bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Agent Task Completionτ2-BENCH (test)
Average Task Reward0.921
27
Agentic Workflow Successτ2-bench
Airline Success Rate60
13
Agenticτ2-Bench
Score91.6
7
Multi-turn tool callingτ2-bench
Overall Score17.77
5
Web-based Decision-makingτ2 Bench Retail, Telecom, Airline
Retail Score48.3
5
Agentτ2-Bench
Accuracy69.5
4
Showing 6 of 6 rows