| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Video Reasoning | Video-Holmes | Accuracy60.5 | 37 | |
| Video Reasoning | Video-Holmes | Score46.7 | 34 | |
| Video Understanding | Video-Holmes | Accuracy58.3 | 27 | |
| Video Reasoning | Video-Holmes | SR56.1 | 22 | |
| Long-video understanding | Video-Holmes (test) | Video-Holmes Score43.4 | 20 | |
| Video Question Answering | Video-Holmes | Average Score46.76 | 12 | |
| Audio-Visual Question Answering | Video-Holmes | Score59.9 | 8 | |
| Audio-Visual Understanding | Video-Holmes | Score0.541 | 6 | |
| Audiovisual Understanding & Reasoning | Video-Holmes | Score59.2 | 4 |