Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

GTM

Benchmarks

Task NameDataset NameSOTA ResultTrend
Tool Use EvaluationGTM
Average Score89.4
11
Tool Error DetectionGTM Error Detection
Detection Rate95.3
11
Tool Response GenerationGTM Multi-turn
Format Score97.2
11
Tool Response GenerationGTM Single-turn
Format Score99.6
11
Showing 4 of 4 rows