| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Temporal Action Localization | ActivityNet 1.3 (val) | AP@0.559.3 | 257 | |
| Text-to-Video Retrieval | ActivityNet | R@166.8 | 197 | |
| Temporal Action Detection | ActivityNet v1.3 (val) | mAP@0.561.72 | 185 | |
| Temporal Action Proposal | ActivityNet v1.3 (val) | AUC69.71 | 114 | |
| Temporal Action Localization | ActivityNet 1.2 (val) | mAP@IoU 0.545.3 | 110 | |
| Text-to-Video Retrieval | ActivityNet (test) | R@179.2 | 108 | |
| Video-to-Text Retrieval | ActivityNet | R@158.2 | 99 | |
| Temporal Action Detection | ActivityNet 1.3 | mAP@0.562.4 | 93 | |
| Temporal Action Detection | ActivityNet 1.3 (test) | Average mAP38.5 | 80 | |
| Video-to-Text Retrieval | ActivityNet (test) | R@171.9 | 63 | |
| Temporal action proposal generation | ActivityNet 1.3 (test) | AUC70.1 | 62 | |
| Text-to-Video Retrieval | ActivityNet Captions (val1) | R@148.9 | 58 | |
| Video Question Answering | ActivityNet (test) | Accuracy62.3 | 57 | |
| Dense Video Captioning | ActivityNet Captions (val) | METEOR16.1 | 54 | |
| Early Action Recognition | ActivityNet (test) | Top-1 Action Accuracy71.17 | 48 | |
| Temporal Action Localization | ActivityNet v1.3 (test) | mAP @ IoU=0.550.6 | 47 | |
| Temporal Grounding | ActivityNet Captions | Recall@1 (IoU=0.5)58.2 | 45 | |
| Dense Video Captioning | ActivityNet Captions | METEOR10.03 | 43 | |
| Video Grounding | ActivityNet Captions | R@1 (IoU=0.5)54.83 | 43 | |
| Video-to-text retrieval | ActivityNet Captions | R@172.9 | 41 | |
| Action Recognition | ActivityNet (test) | mAP94.3 | 38 | |
| Video Retrieval | ActivityNet-Captions (test) | R@113.1 | 38 | |
| Temporal Action Localization | ActivityNet 1.2 (test) | mAP@0.548.3 | 36 | |
| Temporal Action Localization | ActivityNet 1.2 | mAP@0.545.3 | 32 | |
| Temporal Action Localization | ActivityNet 1.3 | Average mAP38 | 32 |