Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HellaSwag

Benchmarks

Task NameDataset NameSOTA ResultTrend
Commonsense ReasoningHellaSwag
Accuracy99.21
1,891
Commonsense ReasoningHellaSwag
HellaSwag Accuracy84
350
Sentence CompletionHellaSwag
Accuracy87.5
276
Common sense reasoningHellaswag
Accuracy93.87
213
ReasoningHellaSwag (HS)
HellaSwag Accuracy86.31
162
Multiple Choice Question AnsweringHellaSwag
Accuracy93.59
93
Commonsense InferenceHellaSwag
Accuracy88.97
91
Commonsense ReasoningHellaSwag
HellaSwag Score95.45
55
Zero-shot ReasoningHellaSwag
Accuracy76.3
48
Commonsense ReasoningHellaSwag
Accuracy95.1
47
Common Sense ReasoningHellaSwag (test)
Accuracy83.9
45
Commonsense ReasoningHellaSwag
Zero-shot Accuracy60.04
36
Commonsense ReasoningHellaSwag 10-shot (test)
Accuracy82.53
34
Common Sense ReasoningHellaSwag 0-shot
Accuracy84.4
34
Commonsense ReasoningHellaSwag
Accuracy (Baseline)82.94
31
Commonsense ReasoningHellaswag
HS Score44.5
28
General KnowledgeHellaSwag
Accuracy91.7
27
Commonsense ReasoningHellaSwag (val)
Accuracy95.3
25
Multilingual Commonsense ReasoningM-Hellaswag
Accuracy (zh)79.2
21
Commonsense ReasoningHELLASWAG (test)
Accuracy95.6
21
LLM Performance EstimationHellaSwag (test)
MAE (%)0.827
20
Common Sense ReasoningHellaSwag (dev)
Accuracy95.4
20
CommonsenseHellaSwag 10-shot
Accuracy (10-shot)64.85
19
Zero-shot PredictionHellaSwag
Zero-shot HellaSwag Accuracy57.14
17
Commonsense reasoningHellaSwag 1.0 (test)
Accuracy85.6
17
Showing 25 of 91 rows