Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

T-Eval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Agent Tool UseT-eval (Held-Out)
Accuracy71.8
14
Tool EvaluationT-Eval
English Score67.6
13
Showing 2 of 2 rows