| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Dialog | VisDial v0.9 (val) | MRR69.35 | 141 | |
| Visual Dialog | VisDial v1.0 (test-std) | NDCG75.13 | 77 | |
| Visual Dialog | VisDial 1.0 (val) | MRR0.6951 | 65 | |
| Visual Dialog | VisDial v0.9 (test) | MRR64.1 | 58 | |
| Visual Dialog Retrieval | VisDial v1.0 (test-standard) | MRR67.25 | 51 | |
| Visual Dialog | VisDial | MRR62.27 | 36 | |
| Visual Dialog | VisDial 0.5 (test) | MRR0.635 | 27 | |
| Visual Dialogue | VisDial v1.0 (test) | NDCG75.2 | 26 | |
| Retrieval | VisDial (test) | Avg R49.29 | 12 | |
| Visual Dialogue | VisDial | VisDial Accuracy70.9 | 10 | |
| Visual Dialogue | VisDial (test) | MRR45.9 | 9 | |
| Contextual Image Generation | VisDial 1.0 (test) | CLIP Similarity0.645 | 9 | |
| Visual Dialogue | VisDial (val) | NDCG75.4 | 9 | |
| Visual Dialog | VisDial v0.5 (test) | MRR63.5 | 7 | |
| Chat-based Image Retrieval | VisDial (val) | Hits@1 (R1)29.53 | 6 | |
| Visual Dialogue | VisDial | NDCG0.511 | 5 | |
| Visual Dialog | VisDial (val) | RAD (Y/N <- C)62.08 | 2 | |
| Visual Dialog | VisDial v1.0 (val) | Y/N Accuracy (C)68.99 | 2 | |
| Interactive Text-to-Image Retrieval | VisDial (val) | #ARNSR3.41 | 2 | |
| Visual Dialogue | VisDial (test-std) | NDCG75.4 | 2 | |
| Question Generation | VisDial v0.9 (val) | MRR0.4138 | 2 |