| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Video Spatial Grounding | ST-Align | vIoU@0.360.3 | 5 | |
| Event Localization and Captioning | ST-Align | tIoU@0.560.4 | 4 | |
| Spatial-Temporal Video Grounding | ST-Align | tIoU@0.544.6 | 4 | |
| Spatial Video Grounding | ST-Align | sIoU@0.347.2 | 2 |