Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Commonsense

Benchmarks

Task NameDataset NameSOTA ResultTrend
Commonsense ReasoningCommonsense
RCC74.6
29
Commonsense ReasoningCommonsense 8 Sub-Tasks
Accuracy (8 Sub-Tasks)61.4
23
Commonsense ReasoningCommonsense170k (test)
BoolQ Accuracy75.4
22
Commonsense ReasoningCommonsense Gender (test)
Accuracy20
5
Commonsense ReasoningCommonsense Race (test)
Correctness Rate40.4
5
Question AnsweringCommonSense
Perplexity25.969
3
Showing 6 of 6 rows