| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Video Understanding | Video-MME | Overall Score78.18 | 96 | |
| Video Understanding | Video-MME | Overall Score78.78 | 92 | |
| Video Understanding | Video-MME without subtitles | Overall Score75 | 89 | |
| Video Understanding | Video-MME v1.0 (test) | Score (Short)72.4 | 56 | |
| Video Understanding | Video-MME (test) | Accuracy88.6 | 51 | |
| Video Question Answering | Video-MME | Accuracy (Average, wo/ Subtitle)82.8 | 48 | |
| Long Video Understanding | Video-MME Long | Accuracy81.9 | 46 | |
| Long Video Understanding | Video-MME long 1.0 | Accuracy (No Subs)67.4 | 45 | |
| Long Video QA | Video-MME | Average Score84.3 | 41 | |
| Long Video Understanding | Video-MME Overall | Accuracy87 | 39 | |
| Video Reasoning | Video-MME | Overall Performance73.1 | 39 | |
| Multi-modal Video Evaluation | Video-MME | Accuracy75 | 38 | |
| Video Question Answering | Video-MME Long | Accuracy82 | 36 | |
| Video Question Answering | Video-MME without subtitles | Accuracy (Overall)73.3 | 34 | |
| Video Question Answering | Video-MME Long Duration 1.0 | Accuracy (w/o subtitles)67.4 | 34 | |
| Video Multimodal Understanding | Video MME | Score61.9 | 33 | |
| Video Understanding | Video-MME Long | Accuracy (Long, wo Sub)67.4 | 32 | |
| Long Video Understanding | Video MME w/o sub (long) | Accuracy71.4 | 30 | |
| Long-video understanding | Video-MME | Overall Score84.3 | 30 | |
| Long-form Video Multimodal Evaluation | Video-MME Long | Video-MME-Long Score56.6 | 24 | |
| Long Video Understanding | Video-MME (w/o sub.) Overall 1010s | Accuracy75 | 22 | |
| Multimodal Video Evaluation | Video-MME Sub (test) | Accuracy87.8 | 22 | |
| Multimodal Video Evaluation | Video-MME (test) | Accuracy88.6 | 22 | |
| Long Video Question Answering | Video-MME | Average Accuracy (w/o subs)75.7 | 22 | |
| Video Summarization | Video-MME 900 videos | Overall Accuracy85.2 | 22 |