| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Video Action Recognition | Perception Test | Top-1 Accuracy60.9 | 16 | |
| Short Video Question Answering | Perception Test (val) | Accuracy74.3 | 9 | |
| Video Question Answering | Perception Test zero-shot few-shot | Accuracy50.2 | 6 | |
| Sound Localisation | Perception Test (val) | AP@0.137.5 | 2 | |
| Action Localisation | Perception Test (val) | AP@0.133.5 | 2 |