Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CSQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Commonsense ReasoningCSQA
Accuracy96
366
Commonsense Question AnsweringCSQA (test)
Accuracy0.953
127
Commonsense ReasoningCSQA
CSQA Accuracy91.2
126
Commonsense ReasoningCSQA (test)
Accuracy89.4
111
Hallucination DetectionCSQA
AUROC85.1
107
Commonsense Question AnsweringCSQA
Accuracy88.9
58
Commonsense Question AnsweringCSQA
Accuracy82.72
44
Commonsense ReasoningCSQA OOD (test)
Accuracy82.1
32
Malicious AgentCSQA
ASR@30.49
28
Prompt InjectionCSQA
ASR@318.33
28
Retrieval-augmented ReasoningCSQA
Accuracy85.42
25
Commonsense ReasoningCSQA
Accuracy (CSQA)66.4
18
Commonsense Question AnsweringCSQA
PIQA84.06
18
Question AnsweringCSQA (test)
Accuracy78.5
18
Hallucination DetectionCSQA (CommonsenseQA)
AUROC (128 steps)84.7
16
Prompt Injection DefenseCSQA
ASR@313.4
16
Commonsense ReasoningCSQA (dev)
Accuracy85.42
16
General ReasoningCSQA
Accuracy81.3
15
Simple ReasoningCSQA
Accuracy91.75
15
ReasoningCSQA (leave-one-out setup)
Accuracy83.8
12
Commonsense Question AnsweringCSQA
Accuracy85.1
12
Commonsense ReasoningCSQA
Accuracy91.5
12
Question AnsweringCSQA (in-domain)
Accuracy83.78
12
Commonsense Question AnsweringCSQA (OOD)
Accuracy63.8
10
Question AnsweringCSQA
Accuracy70.8
10
Showing 25 of 56 rows