| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Video Question Answering | AGQA v2 (test) | Object-relation57.8 | 30 | |
| Video Question Answering | AGQA 2.0 | Object-relation Accuracy62.27 | 15 | |
| Video Question Answering | AGQA v2 | Object Relations Acc57.8 | 12 | |
| Video Question Answering | AGQA-Decomp 1.0 (test) | c-F163.75 | 12 | |
| Video Question Answering | AGQA v1 (test) | Binary Accuracy63.36 | 12 | |
| Temporal VQA | AGQA | Accuracy58.4 | 10 | |
| Grounded Question Answering | AGQA (test) | mIoU27.7 | 8 | |
| Video Question Answering | AGQA-Decomp Novel Composition Setting | Accuracy36.33 | 6 | |
| Video Question Answering | AGQA (more compositional steps) | Binary Accuracy52.24 | 5 | |
| Video Question Answering | AGQA (novel compositions) | Binary Accuracy49.27 | 5 | |
| Video Question Answering | AGQA | Accuracy52.7 | 5 | |
| Video Question Answering | AGQA Balanced (more compositional steps) 2.0 (test) | Binary Acc48.09 | 4 | |
| Video Question Answering | AGQA Balanced indirect references 2.0 (test) | Precision (Object-B)76.93 | 4 | |
| Video Question Answering | AGQA Balanced novel compositions 2.0 (test) | Sequencing (Binary) Acc50.88 | 4 |