| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Video Action Recognition | UCF101 | Top-1 Acc96.9 | 153 | |
| Action Recognition | UCF-101 | Top-1 Acc99.7 | 147 | |
| Video Frame Interpolation | UCF101 | PSNR35.44 | 117 | |
| Video Generation | UCF-101 (test) | Inception Score89.27 | 105 | |
| Video Interpolation | UCF-101 (test) | PSNR35.8 | 65 | |
| Action Recognition | UCF-101 | Accuracy95.5 | 44 | |
| Action Recognition | UCF101 base-to-new | Base Performance89.71 | 36 | |
| Action Recognition | UCF-101 fine-tuning protocol | Accuracy96.1 | 35 | |
| Action Recognition | UCF-101 | 3-Fold Accuracy99.6 | 32 | |
| Image Classification | UCF-101 | Accuracy75.2 | 30 | |
| Video Classification | UCF101 | Accuracy93.7 | 29 | |
| Video Reconstruction | UCF-101 | rFVD20 | 28 | |
| Video Recognition | UCF51 (split 1) | Top-1 Acc70 | 27 | |
| Video Retrieval | UCF51 | Recall@10.676 | 27 | |
| Adversarial Video Purification | UCF-101 | Clean Accuracy96 | 24 | |
| Classification | UCF101 | AURC0.226 | 23 | |
| Action Recognition | UCF-101 | Base Accuracy96.8 | 23 | |
| Video Classification | UCF-101 | Avg Acc91.2 | 19 | |
| Class-conditioned Video Generation | UCF101 (test) | Fréchet Video Distance36 | 19 | |
| Audio-visual Zero-Shot Classification | UCF GZSL cls (test) | S (Seen Accuracy)74.79 | 19 | |
| Class-Conditional Video Generation | UCF101 | gFVD57 | 19 | |
| Video Reconstruction | UCF-101 (test) | rFVD8.6 | 17 | |
| Video Generation | UCF-101 | FVD58 | 17 | |
| Audio-visual Zero-Shot Learning | UCF GZSL | S (Seen Accuracy)91.99 | 17 | |
| Few-shot action classification | UCF101 | Accuracy96.1 | 17 |