Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ToolSandbox

Benchmarks

Task NameDataset NameSOTA ResultTrend
Agent Task CompletionToolSandbox (test)
Avg Task Reward0.704
27
Tool Use EvaluationToolSandbox
Similarity0.923
12
Multi-turn agent decision makingToolSandbox (test)
Success Rate52.2
7
Agent Task CompletionToolSandbox
Average Task Reward0.67
2
Showing 4 of 4 rows