| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Video Recognition | SS v2 | Top-1 Acc69.4 | 47 | |
| Action Recognition | SSv2 Few-shot | Top-1 Acc (5-way 1-shot)66.7 | 42 | |
| Few-shot Action Recognition | SS Full meta v2 (test) | Accuracy69 | 38 | |
| Action Recognition | SSv2 Small | Top-1 Acc (1-shot)60.5 | 26 | |
| Text-to-Video Retrieval | SS label v2 | R@173.3 | 25 | |
| Action Recognition | MiniSS zero-shot v2 | Top-1 Accuracy68.8 | 22 | |
| Action Recognition | SS Full v2 | 1-shot Accuracy75.1 | 21 | |
| Video Action Classification | SSv2 time-correlated (val) | Top-1 Accuracy48.25 | 21 | |
| Video Action Recognition | SS v2 | Base Score19.6 | 15 | |
| 5-way few-shot action recognition | SS small v2 (test) | 1-shot Accuracy57.5 | 13 | |
| Topic Modeling | SS | IRBO100 | 13 | |
| Topic Modeling | SS | NPMI0.146 | 13 | |
| Document Clustering | SS (test) | NMI0.547 | 13 | |
| Video Classification | SS v2 (test val) | Top-1 Accuracy77.5 | 12 | |
| Action Recognition | SSv2 random distribution shifts (test) | Top-1 Accuracy46.32 | 12 | |
| Video Tasks | SS v2 | Accuracy68.9 | 11 | |
| Action-to-Video Retrieval | SSv2 events | mAP7.8 | 10 | |
| Action-to-Video Retrieval | SS v2 | mAP4.3 | 10 | |
| Base-to-novel generalization | SS v2 | Top-1 Acc (Base)18.3 | 9 | |
| Language-driven motion control in Text-to-Video generation | SSv2 (val) | FVD19.27 | 8 | |
| Video Action Recognition | SS v2 (val) | Top-1 Acc (K=2)8.9 | 8 | |
| Category Splitting | SS Split Subset B v2 | Generality38.4 | 7 | |
| Category Splitting | SSv2 Subset A | Generality0.463 | 7 | |
| Text-to-Video Retrieval | SS Temporal zero-shot v2 | mAP15.2 | 7 | |
| Action Recognition | SSv2 (split 1) | Accuracy57.2 | 7 |