Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CODAH

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringCODAH (test)
Accuracy85.79
24
Commonsense Sentence CompletionCODAH (test)
Accuracy84.3
6
Commonsense ReasoningCODAH Synonym Replacement WordNet-based (test)
Accuracy76.2
6
Robustness to TextFooler-based adversarial attacksCODAH (test)
Failure Rate30.9
6
Scene CompletionX-CODAH
Score (EN)69.9
6
Explanation self-consistencyCODAH (test)
Accuracy83.39
3
Showing 6 of 6 rows