Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MetaTool

Benchmarks

Task NameDataset NameSOTA ResultTrend
Tool-use agent performanceMetaTool
ASR (Success Rate)85.9
50
Tool SelectionMetaTool
Similarity80.8
39
Tool-use decision-makingMetaTool (test)
Decision Accuracy90.2
38
Tool selectionMetaTool similar choices subtask (test)
Accuracy83.4
8
Adaptive Tool UseMetaTool
Tool Invocations Count520
8
Tool SelectionMetaTool 199 tools, 1,287 queries (30% test)
R@183
7
Memory-Poisoning AttackMetaTool
Attack Hit Rate (AHR)93.8
3
Tool Selection AttackMetaTool (test)
TDR97.2
3
Showing 8 of 8 rows