Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Aggregate Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
General Language PerformanceAggregate Suite
Average Score78.03
14
Text ClusteringAggregate Suite (test)
Macro Accuracy82.2
14
General EvaluationAggregate Suite PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c
Average Score69
10
Showing 3 of 3 rows