Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CSQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Commonsense ReasoningCSQA
Accuracy96
366
Commonsense Question AnsweringCSQA (test)
Accuracy0.953
127
Commonsense ReasoningCSQA (test)
Accuracy89.4
111
Hallucination DetectionCSQA
AUROC72.47
55
Commonsense Question AnsweringCSQA
Accuracy82.72
44
Malicious AgentCSQA
ASR@30.49
28
Prompt InjectionCSQA
ASR@318.33
28
Retrieval-augmented ReasoningCSQA
Accuracy85.42
25
Commonsense ReasoningCSQA
CSQA Accuracy91.2
21
Commonsense Question AnsweringCSQA
PIQA84.06
18
Question AnsweringCSQA (test)
Accuracy78.5
18
Prompt Injection DefenseCSQA
ASR@313.4
16
Commonsense ReasoningCSQA (dev)
Accuracy85.42
16
Simple ReasoningCSQA
Accuracy91.75
15
Commonsense Question AnsweringCSQA
Accuracy85.1
12
Commonsense ReasoningCSQA
Accuracy91.5
12
Question AnsweringCSQA (in-domain)
Accuracy83.78
12
Commonsense Question AnsweringCSQA (OOD)
Accuracy63.8
10
Multiple Choice Question AnsweringCSQA (dev)
Accuracy71.1
10
Ranking correlation with full dataset evaluationCSQA
Kendall Correlation0.83
10
Commonsense ReasoningCSQA
PIQA84.98
9
Question AnsweringCSQA
µbias0.9
8
Multiple Choice Question AnsweringCSQA (test)
Accuracy82.2
8
Scaling Law PredictionCSQA
MAE0.0255
7
Question AnsweringCSQA
Accuracy69.2
7
Showing 25 of 43 rows