| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Entailment | e-SNLI-VE e-ViL (test) | Human Eval85.7 | 7 | |
| Visual Entailment | E-SNLI-VE | Accuracy79.9 | 7 | |
| Multimodal Explanation | e-SNLI-VE | F1 Score64.9 | 6 | |
| Explanation Generation | E-SNLI-VE (test) | BLEU-140.6 | 6 | |
| Explanation Generation | e-SNLI-VE | Preference: Prefer Ours41.1 | 1 |