| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Video Question Answering | NExT-QA (test) | Accuracy86.3 | 204 | |
| Video Question Answering | NExT-QA (val) | Overall Acc88.4 | 176 | |
| Video Question Answering | NExT-QA Multi-choice | Accuracy84.5 | 114 | |
| Video Question Answering | NEXT-QA | Overall Accuracy85.5 | 105 | |
| Video Question Answering | NExT-QA Main Dataset | Accuracy0.796 | 48 | |
| Video Question Answering | NExT-QA ATPhard | Overall Accuracy71.2 | 33 | |
| Video Question Answering | Next-QA v1 (test) | Overall Acc73.8 | 24 | |
| General video understanding | NExT-QA | Accuracy86.3 | 21 | |
| Video Understanding | NExT-QA | MC Score80.2 | 19 | |
| Multi-choice Video Question Answering | NExT-QA | Overall Accuracy83.2 | 19 | |
| Video Question Answering | NExT-QA zero-shot | Accuracy0.845 | 17 | |
| Video Question Answering | NExT-QA Hard Split (val) | Causal Accuracy51.8 | 14 | |
| Visual Question Answering | NExT-QA Novel Comp. (test) | AP43.05 | 11 | |
| Visual Question Answering | NExT-QA Standard (test) | AP42.43 | 11 | |
| Temporal VQA | Next-QA ATP | Accuracy27.6 | 10 | |
| Continual Video Question Answering | NExT-QA (test) | Accuracy64.75 | 9 | |
| Video Question Answering | NExT-QA v1 (val) | Accuracy73.8 | 9 | |
| General Video Understanding | NexT-QA (test) | Accuracy83.8 | 8 | |
| Video Question Answering | NExT-QA 5-180s (test) | Accuracy80.54 | 8 | |
| Video Question Answering | NExT-QA zero-shot (test) | Temporal Score63.1 | 7 | |
| Video QA | NExT-QA | Accuracy79.7 | 7 | |
| Multiple-Choice Video Question Answering | NExT-QA (val) | Accuracy69.2 | 6 | |
| Video Question Answering | NExT-QA ATP-hard (val) | Acc@C (Causal)48.65 | 6 | |
| Video Question Answering | NEXT-QA open-form generation | WUPS34.3 | 5 | |
| Video Question Answering | NExT-QA original (test) | Temporal Score67 | 4 |