Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Retail-3I

Benchmarks

Task NameDataset NameSOTA ResultTrend
Tool-callingRetail-3I 1.0 (Infeasible)
Pass@10.578
2
Tool-callingRetail-3I Changing 1.0
Pass@161.8
2
Tool-callingRetail-3I Ambiguous 1.0
Pass@10.696
2
Tool-callingRetail-3I General 1.0
Pass@173.6
2
Showing 4 of 4 rows