| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Video Question Answering | NExT-GQA (test) | Acc@GQA39.6 | 29 | |
| Grounded Video Question Answering | NExT-GQA | mIoU61.2 | 28 | |
| Grounded Video Question Answering | NExT-GQA (test) | mIoU39.2 | 24 | |
| Grounded Question Answering | NExT-GQA (test) | Accuracy @ GQA27.6 | 23 | |
| Temporal Question Grounding | NExT-GQA | mIoU0.442 | 14 | |
| Temporal Video Grounding | NExT-GQA (test) | mIoU25.7 | 6 | |
| Temporal Grounding | NExT-GQA | mIoU37.6 | 3 |