| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Spatio-temporal Video Grounding | VidSTG Interrogative Sentences (test) | m_vIoU29.5 | 33 | |
| Spatio-Temporal Video Grounding | VidSTG Declarative Sentences | m_vIoU34.4 | 20 | |
| Spatio-temporal Video Grounding | VidSTG Declarative Sentences (test) | m_vIoU33.14 | 17 | |
| Spatio-Temporal Video Grounding | VidSTG Declarative (test) | m_vIoU34 | 14 | |
| Spatio-Temporal Video Grounding | VidSTG Declarative Sentences 1.0 (test) | Mean vIoU34 | 9 | |
| Spatio-Temporal Video Grounding | VidSTG Interrogative Sentence | m_vIoU29.5 | 8 | |
| Dense Video Object Captioning | VidSTG (test) | CapA43.9 | 5 | |
| Spatio-temporal video grounding | VidSTG Interrogative | m_tIoU47.3 | 3 | |
| Spatio-temporal video grounding | VidSTG Declarative | m_tIoU49 | 3 |