Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

HellaSwag

Benchmarks

Task NameDataset NameSOTA ResultTrend
Commonsense ReasoningHellaSwag
Accuracy99.21
1,460
Common sense reasoningHellaswag
Accuracy93.87
164
ReasoningHellaSwag (HS)
HellaSwag Accuracy86.31
142
Sentence CompletionHellaSwag
Accuracy87.5
133
Multiple Choice Question AnsweringHellaSwag
Accuracy79.19
59
Common Sense ReasoningHellaSwag (test)
Accuracy83.9
45
Commonsense ReasoningHellaSwag
Zero-shot Accuracy60.04
36
Commonsense ReasoningHellaSwag 10-shot (test)
Accuracy82.53
34
Commonsense ReasoningHellaSwag
Accuracy (Baseline)82.94
31
Zero-shot ReasoningHellaSwag
Accuracy76.3
29
Commonsense ReasoningHellaSwag
HellaSwag Score95.45
27
Commonsense ReasoningHellaSwag (val)
Accuracy95.3
25
Common Sense ReasoningHellaSwag 0-shot
Accuracy84.4
22
Multilingual Commonsense ReasoningM-Hellaswag
Accuracy (zh)79.2
21
Commonsense ReasoningHELLASWAG (test)
Accuracy95.6
21
LLM Performance EstimationHellaSwag (test)
MAE (%)0.827
20
Zero-shot PredictionHellaSwag
Zero-shot HellaSwag Accuracy57.14
17
Commonsense reasoningHellaSwag 1.0 (test)
Accuracy85.6
17
Commonsense ReasoningHellaswag Multilingual (test)
Accuracy83.1
16
Commonsense ReasoningHellaswag non-EU languages (test)
Accuracy80.4
16
Science CompletionHellaSwag
Accuracy95.2
16
Commonsense ReasoningHellaSwag published (test)
Accuracy82.35
15
Natural Language UnderstandingHellaSwag
Accuracy40.89
15
Sentence completionHellaSwag (test)
Accuracy72.35
15
Commonsense ReasoningHellaswag 24 official EU languages
Accuracy84.3
14
Showing 25 of 59 rows