Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MetaTool

Benchmarks

Task NameDataset NameSOTA ResultTrend
Tool-use decision-makingMetaTool (test)
Decision Accuracy90.2
38
Tool selectionMetaTool similar choices subtask (test)
Accuracy83.4
8
Adaptive Tool UseMetaTool
Tool Invocations Count520
8
Tool SelectionMetaTool 199 tools, 1,287 queries (30% test)
R@183
7
Tool Selection AttackMetaTool (test)
TDR97.2
3
Showing 5 of 5 rows