MMLU, ARC-Challenge, and CommonsenseQA

Benchmarks

Task Name	Dataset Name	SOTA Result	Trend
General Language Modeling	MMLU, ARC-Challenge, and CommonsenseQA Aggregate	Average Score64.77		24

Showing 1 of 1 rows