| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Segment-level Video Captioning | ViTT Cooking (test) | BLEU-141.61 | 9 | |
| Segment-level Video Captioning | ViTT-All (test) | BLEU-143.34 | 9 | |
| Event localization | ViTT (test) | Recall45.89 | 4 | |
| Event Captioning | ViTT (test) | CIDEr51.29 | 3 | |
| Dense Video Captioning | ViTT (test) | SODA_c25 | 2 | |
| Video Captioning | ViTT | BLEU-137.89 | 2 | |
| Segment-level Video Captioning | ViTT Cooking 1.0 (test) | BLEU-1- | 0 | |
| Segment-level Video Captioning | ViTT-All 1.0 (test) | BLEU-1- | 0 |