| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Video Question Answering | TGIF-QA | Accuracy97.2 | 147 | |
| Video Question Answering | TGIF-QA (test) | Accuracy95.5 | 89 | |
| Transition Video Question Answering | TGIF-QA (test) | Accuracy97.6 | 28 | |
| Video Question Answering | TGIF-QA FrameQA | Accuracy78.7 | 19 | |
| Video Question Answering | TGIF-QA Action original (test) | Accuracy95 | 17 | |
| Video Question Answering | TGIF-QA zero-shot (test) | Accuracy81.3 | 15 | |
| Frame-QA | TGIF-QA | Accuracy69.5 | 14 | |
| Transition Question Answering | TGIF-QA | Accuracy97.6 | 14 | |
| Video Question Answering | TGIF-QA original (test) | Repetition Count Loss (Mean L2)4.2825 | 13 | |
| Video Question Answering | TGIF-QA-R (test) | Action Accuracy0.665 | 12 | |
| Video Question Answering | TGIF-QA v2 (test) | Action Acc68.2 | 12 | |
| Video Question Answering | TGIF-QA LLaVA-Hound out-of-domain (test) | Accuracy65.5 | 11 | |
| Video Question Answering | TGIF-QA-R Transition curated (test) | Accuracy73.8 | 6 | |
| Video Question Answering | TGIF-QA-R Action curated (test) | Accuracy61 | 6 | |
| Repetition Count | TGIF-QA (test) | MSE3.82 | 5 | |
| Transition Question Answering | TGIF-QA-R | Accuracy71 | 4 | |
| Video Question Answering | TGIF-QA-R | Accuracy65.7 | 4 | |
| Video Question Answering | TGIF-QA FrameQA (test) | Accuracy51.4 | 3 |