Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Pooled tasks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Language Model EvaluationPooled tasks Table 5 Llama-3.1 3.3 (various)
Pooled Accuracy Estimate (γ̂)57.15
21
Showing 1 of 1 rows