Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ECQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Commonsense ReasoningECQA
MV Score77.4
27
ReasoningECQA
CACC83.96
25
Question AnsweringECQA
Accuracy70.34
12
Natural Language Explanation GenerationECQA
Human Evaluation Score73.33
7
Commonsense Question AnsweringECQA (test)
Accuracy79.7
7
Explanation GenerationECQA (out-domain)
Grammar Score2.99
7
Natural Language Explanation GenerationECQA (test)
Accuracy59.4
6
Explanation GenerationECQA complete (test)
BERTScore87.67
6
Explanation self-consistencyECQA (test)
Accuracy71.11
4
Open-Label QAECQA
COS-E0.398
4
CoT Soundness EvaluationECQA
CSR87
3
CoT NaturalnessECQA
Perplexity (PPL)20.15
3
Commonsense ReasoningECQA
Pass@10.7612
3
Natural Language Explanation GenerationECQA few-shot 60-shot
Accuracy24.53
3
Commonsense Question AnsweringECQA
Performance Score (Finetune Baseline vs Predict Baseline)57.2
2
Showing 15 of 15 rows