Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

T-Eval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Agent Tool UseT-eval (Held-Out)
Accuracy71.8
14
Tool EvaluationT-Eval
English Score67.6
13
Showing 2 of 2 rows