Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GEN suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Generative Language Modeling and Problem SolvingGEN suite IFEval, AIME25, GSM8K, GPQA, HumanEval, LCB
IFEval Score90.4
5
Showing 1 of 1 rows