| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Video Reasoning | Video-Holmes | Accuracy60.5 | 83 | |
| Video Reasoning | Video-Holmes | Score46.7 | 34 | |
| Video Understanding | Video-Holmes | Accuracy58.3 | 27 | |
| Video Reasoning | Video-Holmes | SR56.1 | 22 | |
| Long-video understanding | Video-Holmes (test) | Video-Holmes Score43.4 | 20 | |
| Video Reasoning | Video-Holmes | Accuracy52.3 | 12 | |
| Video Question Answering | Video-Holmes | Average Score46.76 | 12 | |
| Audio-Visual Question Answering | Video Holmes 32 frames | SR64.4 | 8 | |
| Audio-Visual Question Answering | Video-Holmes | Score59.9 | 8 | |
| Audio-Visual Understanding | Video-Holmes | Score0.541 | 6 | |
| Audiovisual Understanding & Reasoning | Video-Holmes | Score59.2 | 4 | |
| Video Understanding | Video-Holmes | Score64.3 | 3 | |
| Video Question Answering | Video-HOLMES stress (test) | Average Score32 | 3 |