| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Video Question Answering | M3IT IVQA | Rouge-L63.9 | 15 | |
| Video Question Answering | M3IT MSRVTT-QA | ROUGE-L48.7 | 15 | |
| Video Question Answering | M3IT ActivNetQA | Rouge-L63.6 | 15 | |
| Visual Machine Reading Comprehension | M3IT VisualMRC | Rouge-L57.4 | 15 | |
| Visual Question Answering | M3IT VIQUAE | Rouge-L50.2 | 15 | |
| Image Captioning | M3IT COCO | Rouge-L38.8 | 15 | |
| Visual Commonsense Reasoning | M3IT VCR (test) | F1 Score67.1 | 7 |