Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CSQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Commonsense ReasoningCSQA
Accuracy96
366
Commonsense ReasoningCSQA
CSQA Accuracy91.2
195
Commonsense Question AnsweringCSQA (test)
Accuracy0.953
127
Commonsense ReasoningCSQA (test)
Accuracy89.4
111
Hallucination DetectionCSQA
AUROC85.1
107
Commonsense Question AnsweringCSQA
Accuracy88.9
71
Question AnsweringCSQA (test)
Accuracy87.2
68
Commonsense Question AnsweringCSQA
Accuracy82.72
61
Prompt InjectionCSQA
ASR@318.33
52
Prompt InjectionCSQA
ASR62
36
Question AnsweringCSQA
Accuracy88
36
Commonsense ReasoningCSQA OOD (test)
Accuracy82.1
32
Commonsense ReasoningCSQA
Task Success Rate (TSR)82.5
30
Malicious AgentCSQA
ASR@30.49
28
Retrieval-augmented ReasoningCSQA
Accuracy85.42
25
Memory AttackCSQA
ASR@31.67
24
Commonsense ReasoningCSQA
Accuracy88.1
20
General Knowledge QACSQA
Accuracy84.1
18
Commonsense ReasoningCSQA
Accuracy (CSQA)66.4
18
Commonsense Question AnsweringCSQA
PIQA84.06
18
Hallucination DetectionCSQA (CommonsenseQA)
AUROC (128 steps)84.7
16
Prompt Injection DefenseCSQA
ASR@313.4
16
Commonsense ReasoningCSQA (dev)
Accuracy85.42
16
General ReasoningCSQA
Accuracy81.3
15
Simple ReasoningCSQA
Accuracy91.75
15
Showing 25 of 70 rows