Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Short-context benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question Answering and Commonsense ReasoningShort-context benchmarks ARC-C, ARC-E, PIQA, Winogrande, HellaSwag
ARC-C Accuracy45.76
17
Showing 1 of 1 rows