Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GTM

Benchmarks

Task NameDataset NameSOTA ResultTrend
Tool Use EvaluationGTM
Average Score89.4
11
Tool Error DetectionGTM Error Detection
Detection Rate95.3
11
Tool Response GenerationGTM Multi-turn
Format Score97.2
11
Tool Response GenerationGTM Single-turn
Format Score99.6
11
Showing 4 of 4 rows