| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Commonsense Reasoning | VCR (val) | Accuracy82.57 | 63 | |
| Visual Commonsense Reasoning | VCR-EN-Easy | EM95.7 | 27 | |
| Visual Commonsense Reasoning (Q→A) | VCR Shortcut Mitigated Evaluation SM (test) | Q->A Accuracy (100%)76.32 | 13 | |
| Visual Commonsense Reasoning (Q→A) | VCR Standard Evaluation (test) | Q->A Accuracy (3%, 1000 SP/C)61.49 | 13 | |
| Visual Commonsense Reasoning Q→A | VCR (val) | Accuracy79.4 | 8 | |
| Holistic Reasoning (Q -> AR) | VCR (val) | Accuracy55 | 8 | |
| Rationale Selection (QA -> R) | VCR (val) | Accuracy76 | 8 | |
| Visual Question Answering (Q -> A) | VCR (val) | Accuracy73.7 | 8 | |
| Visual Commonsense Reasoning | VCR 1.0 (test) | Q->A Accuracy59 | 7 | |
| Visual Commonsense Reasoning | VCR QR -> A (val) | Accuracy94.7 | 7 | |
| Visual Commonsense Reasoning | VCR 1.0 (val) | Q->A Accuracy80.6 | 7 | |
| Visual Commonsense Reasoning | VCR e-ViL (test) | Meteor Score26.4 | 6 | |
| Visual Question Answering (Q -> A) | VCR (test) | Accuracy74 | 6 | |
| Visual Commonsense Reasoning | VCR OC 1 | Accuracy36.93 | 4 | |
| Visual Commonsense Reasoning | VCR1 MCI | Accuracy60.01 | 4 | |
| Visual Commonsense Reasoning | VCR | Q->A Accuracy79.8 | 4 | |
| Explanation Generation | VCR | Preference Rate (Ours)26.4 | 1 |