Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Tool Use Evaluation

Benchmarks

Task NameDataset NameSOTA ResultTrend
Tool CallingTool Use Evaluation (test)
Exact Match (EM)61.31
3
Tool IdentificationTool Use Evaluation (test)
EM Accuracy78.3
3
Showing 2 of 2 rows