| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Video-to-audio generation | LongVale | FD (VGG)3.23 | 8 | |
| Omni-modal segment captioning | LongVALE 1.0 (test) | ROUGE-L0.224 | 8 | |
| Omni-modal dense video captioning | LongVALE 1.0 (test) | SODA_c2.8 | 8 | |
| Omni-modal temporal video grounding | LongVALE 1.0 (test) | R@0.315.7 | 8 |