Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Aggregated Evaluation Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
General ReasoningAggregated Evaluation Suite Coding, Math, Science
Code Average20.38
21
Language UnderstandingAggregated Evaluation Suite 10% retention rate
Macro Rank1.6983
4
Showing 2 of 2 rows