| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Text-to-Video retrieval | LSMDC (test) | R@11,120 | 225 | |
| Text-to-Video Retrieval | LSMDC | R@146.4 | 154 | |
| Video-to-Text Retrieval | LSMDC | R@146.7 | 51 | |
| Movie Fill-in-the-Blank | LSMDC 2016 (test) | Accuracy68.7 | 34 | |
| Text-to-Video Retrieval | LSMDC 1K videos (test) | R@11,790 | 16 | |
| Movie Retrieval | LSMDC 17 (public test) | Recall@19.1 | 16 | |
| Text-to-Video Retrieval | LSMDC zero-shot | Recall@125.2 | 15 | |
| Fill-in-the-blank Video Question Answering | LSMDC | Accuracy63.5 | 13 | |
| Movie Retrieval | LSMDC 2016 (test) | R@13 | 13 | |
| Video-to-Text Retrieval | LSMDC Movie Description (test) | R@124.5 | 12 | |
| Multiple-choice Question Answering | LSMDC (test) | Accuracy84.4 | 12 | |
| Multiple-Choice | LSMDC 2016 (test) | Accuracy67 | 11 | |
| Video clip retrieval | LSMDC | R@19.1 | 10 | |
| Video Annotation | LSMDC 2016 (challenge) | CIDEr-D0.109 | 10 | |
| Movie Description | LSMDC 2016 (public test) | BLEU-10.162 | 9 | |
| Text-to-Video Retrieval | LSMDC 38 (full-corpus) | R@15.7 | 8 | |
| Text-to-Video Retrieval | LSMDC 38 (test) | R@125.2 | 8 | |
| Video Question Answering | LSMDC Multiple Choice | Accuracy84.4 | 8 | |
| Video-to-Text retrieval | LSMDC (test) | MC Accuracy73.9 | 8 | |
| Text-to-Video Retrieval | LSMDC (val) | R@10.178 | 7 | |
| Text-to-Video Retrieval | LSMDC 1,000 videos | R@124.4 | 7 | |
| Multiple Choice Video Question Answering | LSMDC MC (test) | Accuracy86 | 7 | |
| Text-to-Video Retrieval | LSMDC 39 (val) | R@126.5 | 7 | |
| Identity-aware captioning | LSMDC (test) | CIDEr9.09 | 6 | |
| Fill-in-the-blanks | LSMDC Task 2 (FITB) challenge (test) | Same-acc0.606 | 4 |