| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Video Question Answering | NExT-QA (test) | Accuracy86.3 | 204 | |
| Video Question Answering | NExT-QA (val) | Overall Acc88.4 | 176 | |
| Video Question Answering | NEXT-QA | Overall Accuracy85.5 | 105 | |
| Video Question Answering | NExT-QA Multi-choice | Accuracy83.2 | 102 | |
| Video Question Answering | NExT-QA Main Dataset | Accuracy0.796 | 48 | |
| Video Question Answering | NExT-QA ATPhard | Overall Accuracy58.4 | 27 | |
| Video Question Answering | Next-QA v1 (test) | Overall Acc73.8 | 24 | |
| Video Question Answering | NExT-QA Hard Split (val) | Causal Accuracy51.8 | 14 | |
| Multi-choice Video Question Answering | NExT-QA | Descriptive Accuracy72.7 | 12 | |
| Temporal VQA | Next-QA ATP | Accuracy27.6 | 10 | |
| Video Question Answering | NExT-QA v1 (val) | Accuracy73.8 | 9 | |
| Video Question Answering | NExT-QA zero-shot (test) | Temporal Score63.1 | 7 | |
| Video Question Answering | NExT-QA zero-shot | Accuracy0.636 | 7 | |
| Multiple-Choice Video Question Answering | NExT-QA (val) | Accuracy69.2 | 6 | |
| Video Question Answering | NExT-QA ATP-hard (val) | Acc@C (Causal)48.65 | 6 | |
| Video Question Answering | NEXT-QA open-form generation | WUPS34.3 | 5 | |
| Video Question Answering | NExT-QA original (test) | Temporal Score67 | 4 | |
| Video Question Answering | NExT-QA de-biased set | Temporal Reasoning Score60.8 | 4 | |
| Temporal and Causal Reasoning | NEXT-QA (test) | WUPS30.3 | 4 | |
| Video Question Answering | NExT-QA Hard Split - Causal | Accuracy56.4 | 3 | |
| Video Question Answering | NExT-QA Hard Split - Temporal | Accuracy49.8 | 3 | |
| Video QA | NExT-QA | Accuracy38.3 | 2 |