Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CommonsenseQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Selective PredictionCommonsenseQA
Power0.9999
207
Question AnsweringCommonsenseQA
Accuracy89.3
148
Commonsense ReasoningCommonSenseQA
Accuracy91.2
136
Question AnsweringCommonsenseQA (CSQA)
Accuracy91.2
124
Commonsense Question AnsweringCommonSenseQA
Accuracy88.9
83
Commonsense ReasoningCommonsenseQA (test)
Accuracy90
62
Question AnsweringCommonsenseQA (test)
Accuracy83.3
60
Question AnsweringCommonsenseQA IH (test)
Accuracy88.9
57
Commonsense ReasoningCommonsenseQA (CSQA)
Accuracy85.7
56
Commonsense ReasoningCommonSenseQA
BS0.1054
54
Question AnsweringCommonsenseQA IH (dev)
Accuracy82.7
53
Commonsense ReasoningCommonsenseQA (val)
Accuracy82.06
52
Hallucination DetectionCommonsenseQA
Mean AUROC0.7563
48
Commonsense ReasoningCommonsenseQA (CSQA) v1.0 (test)
Accuracy64.11
46
Commonsense ReasoningCommonsenseQA
Accuracy (pass@1)86.6
45
Commonsense ReasoningCommonsenseQA Non-Math
Accuracy87.31
32
RetrievalCommonsenseQA
Accuracy86.81
25
Commonsense Question AnsweringCommonsenseQA (CSQA) (val)
Accuracy75.7
23
Commonsense Question AnsweringCommonsenseQA v1.0 (dev)
Accuracy79.3
22
Multiple-choice Question AnsweringCommonsenseQA (CSQA)
Accuracy66.4
21
Veracity InferenceCOMMONSENSEQA 1,000 examples
Mean Hamming Similarity0.935
20
KnowledgeCommonSenseQA CoQA
Score66.91
20
Commonsense Question AnsweringCommonsenseQA blind v1.0 (test)
Accuracy75.3
20
Multiple-choice Question AnsweringCommonsenseQA (dev)
Accuracy76.2
18
Question AnsweringCommonsenseQA ConceptNet (20% test)
Accuracy81.3
16
Showing 25 of 37 rows