Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Tool use

Benchmarks

Task NameDataset NameSOTA ResultTrend
Tool Usetool-use (test)
Accuracy72
24
Tool Use ReasoningTool use
Mean Accuracy @1661.31
24
Tool UseTool Use Non-Live
Para0.925
15
Tool UseTool Use Live
Para Score56.25
15
Tool UseTool use
Avg@1668.5
14
Tool UseTool-use multi-turn (test)
Accuracy76.8
6
Tool UseTool Use Domain Average
Tool Use Average Accuracy71.1
4
Showing 7 of 7 rows