Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ToolAlpaca

Benchmarks

Task NameDataset NameSOTA ResultTrend
Tool UseToolAlpaca
Tool Use Success Rate77.9
26
Tool-use InferenceToolAlpaca
Match Rate5.26
22
Tool-use reasoningToolAlpaca
Accuracy66.73
20
Tool selectionToolAlpaca
Accuracy97.42
20
Tool usage simulationToolAlpaca evaluation
Procedure Score78.38
12
Showing 5 of 5 rows