Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Pooled tasks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Language Model EvaluationPooled tasks Table 5 Llama-3.1 3.3 (various)
Pooled Accuracy Estimate (γ̂)57.15
21
Showing 1 of 1 rows