| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Video Captioning | Youcook2 | METEOR40.4 | 104 | |
| Text-to-Video Retrieval | YouCook2 (val) | R@11,510 | 66 | |
| Text-to-video retrieval | Youcook2 (test) | Recall@1098.9 | 54 | |
| Video Captioning | YouCook2 (test) | CIDEr190 | 42 | |
| Dense Video Captioning | YouCook2 (val) | METEOR9.41 | 19 | |
| Segment-level Video Captioning | YouCook2 | BLEU-415.2 | 17 | |
| Image-Text Matching | YouCook2 EQBEN | Text Score59.22 | 14 | |
| Event localization | YouCook2 (val) | Recall32.51 | 13 | |
| Event Captioning | YouCook2 1.0 (val) | METEOR12.8 | 12 | |
| Streaming Narration | YouCook2 (test) | F135.55 | 10 | |
| Text-to-Video-Audio Retrieval | YouCook2 | Recall@151.3 | 8 | |
| Interaction localization | YouCook2 Interactions 1.0 (test) | Localization Accuracy55.8 | 8 | |
| Step localization | YouCook2 | Recall77.4 | 7 | |
| Event Captioning | YouCook2 | METEOR9.3 | 6 | |
| Interaction Localization | YouCook2 Interactions (val) | Localization Accuracy70.4 | 4 | |
| Event Localization | YouCook2 | F1 Score28.43 | 3 | |
| Audio-to-Video Retrieval | Youcook2 | Recall@132.8 | 3 | |
| Text-to-Video Retrieval | YouCook2 v1 (val) | R@1- | 0 | |
| Segment-level Video Captioning | YouCook2 Segment-level (val) | BLEU-4- | 0 |