Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Downstream Tasks Evaluation Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Language Model EvaluationDownstream Tasks Evaluation Suite Math, Code, Law, Know., Reason., MMLU
Math Accuracy4.92
9
Showing 1 of 1 rows