| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| VCR (val) | VILLA | Accuracy82.57 | 63 | 4d ago | |
| VCR (Visual Commonsense Reasoning) (test) | ModCR | Accuracy94 | 54 | 4d ago | |
| VCR-EN-Easy | InternVL2.5-78B | EM95.7 | 27 | 2d ago | |
| CommonsenseT2I | MetaQuery-L | Accuracy57.67 | 13 | 4d ago | |
| GD-VCR (test) | GIVL (1M) | Accuracy72.01 | 10 | 4d ago | |
| VisualCOMET | BLIP-2 ViT-G + LSKD | Acc@5040.3 | 7 | 4d ago | |
| VCR 1.0 (test) | BLIP-2 ViT-G + LSKD | Q->A Accuracy59 | 7 | 3d ago | |
| VCR QR -> A (val) | ModCR | Accuracy94.7 | 7 | 4d ago | |
| VCR 1.0 (val) | MERLOT | Q->A Accuracy80.6 | 7 | 3d ago | |
| VCR e-ViL (test) | NLX-GPT | Meteor Score26.4 | 6 | 3d ago | |
| VisComet (val) | UNIFIED-IO XL | CIDEr91.1 | 5 | 4d ago | |
| VCR OC 1 | MiniGPT-4 | Accuracy36.93 | 4 | 4d ago | |
| VCR1 MCI | SELF-FILTER | Accuracy60.01 | 4 | 4d ago | |
| VCR | UNITER-large | Q->A Accuracy79.8 | 4 | 3d ago | |
| VCR (Visual Commonsense Reasoning) (dev) | VisualBERT | Q -> A Accuracy70.8 | 4 | 3d ago |