Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Retail-3I

Benchmarks

Task NameDataset NameSOTA ResultTrend
Tool-callingRetail-3I 1.0 (Infeasible)
Pass@10.578
2
Tool-callingRetail-3I Changing 1.0
Pass@161.8
2
Tool-callingRetail-3I Ambiguous 1.0
Pass@10.696
2
Tool-callingRetail-3I General 1.0
Pass@173.6
2
Showing 4 of 4 rows