| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Membership Inference | Video modality | AUC-ROC100 | 16 | |
| Video Analytics | Video | Cost per 1K Requests0.49 | 15 | |
| Video reasoning | Video-R1 | VSI44.3 | 12 | |
| Sequential Recommendation | Video | NDCG@52.17 | 8 | |
| Future item recommendation | Video | Recall11.3 | 7 | |
| Video Semantic Segmentation | 1024 x 512 resolution (video) | Speed (FPS)18.15 | 6 | |
| Point Tracking | 24-frame video | Throughput23,405.71 | 5 | |
| Session-based recommendation | VIDEO | Recall@2066.24 | 5 | |
| Visual Dubbing | Video 3-second 25fps 512x512 resolution | Inference Time (s)1 | 4 | |
| Object Detection | video (train) | Accuracy93 | 4 | |
| Misalignment reduction | Video #5 | ITF (dB)21.66 | 3 | |
| Misalignment reduction | Video #4 | ITF (dB)19.26 | 3 | |
| Misalignment reduction | Video #3 | ITF (dB)22.26 | 3 | |
| Misalignment reduction | Video 1 | ITF (dB)17.54 | 3 | |
| Multi-object Backdoor Attack | Video 9 | ASR1 | 3 | |
| Multi-object Backdoor Attack | Video 8 | ASR100 | 3 | |
| Multi-object Backdoor Attack | Video 7 | ASR99.97 | 3 | |
| Segment Anything | 14 new Video | mIoU (1-click)69.6 | 3 | |
| Automatic Speech Recognition | Video Average of 7 subsets | WER0.027 | 3 | |
| Frame Interpolation | 720p Video | Inference Time (s)0.393 | 3 |