| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Abstractive Summarization | How2 (test) | Content F148.9 | 18 | |
| Multimodal Abstractive Summarization | How2 (test) | ROUGE-167.7 | 13 | |
| Multimodal Abstractive Text Summarization | How2 300h (test) | ROUGE-148.85 | 9 | |
| Multimodal Summarization | How2 | ROUGE-170.1 | 6 | |
| Audio-Visual Automatic Speech Recognition | How2 zero-shot | WER13.69 | 6 | |
| Audio-Visual Automatic Speech Recognition | How2 | WER9.11 | 5 | |
| Video Summarization | How2 1.0 (test) | INF3.89 | 3 |