Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Commonsense

Benchmarks

Task NameDataset NameSOTA ResultTrend
Commonsense ReasoningCommonsense
RCC74.6
29
Commonsense ReasoningCommonsense 8 Sub-Tasks
Accuracy (8 Sub-Tasks)61.4
26
Commonsense ReasoningCommonsense170k (test)
BoolQ Accuracy75.4
22
Morality EvaluationCommonsense
Mean Improvement10
9
Commonsense ReasoningCommonsense-15K (test)
ARC-Challenge Accuracy36.11
7
Commonsense reasoningCommonsense-15K
ARC-Challenge Accuracy53.33
5
Commonsense ReasoningCommonsense Gender (test)
Accuracy20
5
Commonsense ReasoningCommonsense Race (test)
Correctness Rate40.4
5
Question AnsweringCommonSense
Perplexity25.969
3
Showing 9 of 9 rows