| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Complex Reasoning | Video-TT | Accuracy46.5 | 19 | |
| Temporal and Textual Video Reasoning | Video-TT MC | Accuracy46.8 | 9 | |
| Video Understanding | Video-TT | Score40.4 | 6 | |
| Long-form video understanding and instruction following | Video-TT mc | Accuracy44.3 | 3 |