Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ComVE

Benchmarks

Task NameDataset NameSOTA ResultTrend
Commonsense Validation and ExplanationComVE (test)
Accuracy98.5
13
Natural Language Explanation GenerationComVE
Human Evaluation Score70
7
Natural Language Explanation GenerationComVE few-shot 60-shot
Accuracy68.03
3
Commonsense Validation and ExplanationComVE
Performance (F+B -> P+B)0.88
2
Showing 4 of 4 rows