Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CommonsenseQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Selective PredictionCommonsenseQA
Power0.9999
207
Question AnsweringCommonsenseQA
Accuracy89.3
150
Commonsense ReasoningCommonSenseQA
Accuracy91.2
136
Question AnsweringCommonsenseQA (CSQA)
Accuracy91.2
124
Commonsense ReasoningCommonsenseQA
Accuracy (pass@1)86.6
108
Commonsense Question AnsweringCommonSenseQA
Accuracy88.9
92
Hallucination DetectionCommonsenseQA
Mean AUROC0.7563
62
Commonsense ReasoningCommonsenseQA (test)
Accuracy90
62
Question AnsweringCommonsenseQA (test)
Accuracy83.3
60
Question AnsweringCommonsenseQA IH (test)
Accuracy88.9
57
Commonsense ReasoningCommonsenseQA (CSQA)
Accuracy85.7
56
Commonsense ReasoningCommonSenseQA
BS0.1054
54
Question AnsweringCommonsenseQA IH (dev)
Accuracy82.7
53
Commonsense ReasoningCommonsenseQA (val)
Accuracy82.06
52
Question AnsweringCommonsenseQA
AUC74.48
51
Commonsense ReasoningCommonsenseQA (CSQA) v1.0 (test)
Accuracy64.11
46
Commonsense ReasoningCommonsenseQA Non-Math
Accuracy87.31
32
RetrievalCommonsenseQA
Accuracy86.81
25
Commonsense Question AnsweringCommonsenseQA (CSQA) (val)
Accuracy75.7
23
Commonsense Question AnsweringCommonsenseQA v1.0 (dev)
Accuracy79.3
22
Question AnsweringCommonsenseQA (CSQA) (test)
VWR40.85
21
Multiple-choice Question AnsweringCommonsenseQA (CSQA)
Accuracy66.4
21
Veracity InferenceCOMMONSENSEQA 1,000 examples
Mean Hamming Similarity0.935
20
KnowledgeCommonSenseQA CoQA
Score66.91
20
Commonsense Question AnsweringCommonsenseQA blind v1.0 (test)
Accuracy75.3
20
Showing 25 of 53 rows