Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Tool use

Benchmarks

Task NameDataset NameSOTA ResultTrend
Tool UseTool Use Non-Live
Para0.925
15
Tool UseTool Use Live
Para Score56.25
15
Tool UseTool use
Avg@1668.5
14
Tool Use ReasoningTool use
Avg Accuracy @16 (1h)68
8
Tool UseTool-use multi-turn (test)
Accuracy76.8
6
Showing 5 of 5 rows