General NLP Evaluation

Benchmarks

Dataset Name	SOTA Method	Metric	Trend
Natural Language Benchmarks Aggregate		Average Score62.31		30	5mo ago
The Pile Downstream Evaluation Suite	DOGE	HellaSwag Accuracy29.7		7	3mo ago

Showing 2 of 2 rows