Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

EntailmentBank

Benchmarks

Task NameDataset NameSOTA ResultTrend
Conclusion GenerationEntailmentBank (test)
BLEU67
26
Natural Language InferenceEntailmentBank (test)
BLEU54
20
Reasoning quality evaluationENTAILMENTBANK
Somers' D0.1773
15
Explanation RefinementEntailmentBank
Initial Score25.33
15
Explanatory InferenceEntailmentBank
BLEU57
12
Entailment Tree GenerationEntailmentBank Task 3 (Full Unseen)
Leaves F147.1
10
Entailment Tree GenerationEntailmentBank Task 2 (Distractors)
Leaves F190.3
6
Entailment Tree GenerationEntailmentBank Task 1 (No Distractors)
Leaves F1100
6
Entailment tree generationEntailmentBank (test)
Leaves F145.6
5
Entailment tree generationEntailmentBank 50 samples (test)
FV100
4
Question AnsweringEntailmentBankQA Easy (test)
Answer Accuracy70.8
3
Entailment ReasoningEntailmentBank
Accuracy81.8
2
Showing 12 of 12 rows