Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Aggregate Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
General Language PerformanceAggregate Suite
Average Score78.03
14
Text ClusteringAggregate Suite (test)
Macro Accuracy82.2
14
General EvaluationAggregate Suite PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c
Average Score69
10
Showing 3 of 3 rows