Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Tooluse and Previous Task Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Skill LearningTooluse and Previous Task Suite (Hellaswag, Humaneval, IFeval, MMLU, TruthfulQA, Winogrande)
Tooluse70.6
5
Showing 1 of 1 rows