Pooled tasks

Benchmarks

Task Name	Dataset Name	SOTA Result	Trend
Language Model Evaluation	Pooled tasks Table 5 Llama-3.1 3.3 (various)	Pooled Accuracy Estimate (γ̂)57.15		21

Showing 1 of 1 rows