Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Standard Downstream Benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Natural Language Understanding and ReasoningStandard Downstream Benchmarks Two-Shot (val)
ARC-E Accuracy (Normalized)56.86
11
Showing 1 of 1 rows