Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SWAG

Benchmarks

Task NameDataset NameSOTA ResultTrend
Common Sense ReasoningSWAG
Accuracy92.29
24
Commonsense ReasoningSWAG (test)
Accuracy0.9412
13
Commonsense ReasoningSWAG (dev)
Accuracy91.2
11
Ranking correlation with full dataset evaluationSWAG
Kendall Correlation0.93
10
Commonsense ReasoningSWAG (val)
Accuracy85.5
9
Commonsense ReasoningSWAG (Generalization)
Accuracy (SWAG Generalization)48.88
8
Commonsense ReasoningSWAG In-Domain (test)
Accuracy83.14
8
Natural Language UnderstandingSWAG (dev)
Accuracy92.59
6
Grounded Commonsense InferenceSWAG (test)
Accuracy88
6
Grounded Commonsense InferenceSWAG (dev)
Accuracy86.6
4
Multiple-ChoiceSwag (test)
Accuracy80.85
3
Question AnsweringSWAG (dev)
Accuracy0.908
3
Showing 12 of 12 rows