| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Temporal Jigsaw Puzzle Solving | CLEVRER | Normalized Kendall Distance0 | 13 | |
| Temporal and causal video reasoning | CLEVRER-Humans (test) | Accuracy (Per Option)74.1 | 12 | |
| Visual Question Answering | CLEVRER 1.0 (test) | Descriptive Accuracy0.94 | 8 | |
| Video Question Answering | CLEVRER (test) | Descriptive Accuracy96.46 | 7 | |
| Video Generation | CLEVRER 256x256 (test) | FVD87.4 | 6 |