Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Short-context benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question Answering and Commonsense ReasoningShort-context benchmarks ARC-C, ARC-E, PIQA, Winogrande, HellaSwag
ARC-C Accuracy63.48
45
Multiple Choice Question Answering and ReasoningShort Context Benchmarks MMLU, SciQ, OQA, CQA, SIQA, PIQA, HellaSwag, WinoGrande, ARC-c, ARC-e
MMLU Accuracy74.25
10
Showing 2 of 2 rows