| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Video Understanding | Video-MME without subtitles | Overall Score84.8 | 108 | |
| Video Understanding | Video-MME | Overall Score78.18 | 96 | |
| Video Understanding | Video-MME | Overall Score78.78 | 92 | |
| Long Video Understanding | Video-MME Long | Accuracy81.9 | 92 | |
| General Video Understanding | Video-MME | Accuracy87.8 | 82 | |
| Video Understanding | Video-MME v1.0 (test) | Score (Short)72.4 | 56 | |
| Video Reasoning | Video-MME | Overall Performance73.1 | 55 | |
| Long Video Understanding | Video-MME Overall | Accuracy87 | 53 | |
| Long Video Understanding | Video-MME (full) | Overall Performance66.4 | 51 | |
| Video Understanding | Video-MME (test) | Accuracy88.6 | 51 | |
| Video Question Answering | Video-MME | Accuracy (Average, wo/ Subtitle)82.8 | 48 | |
| Long-video understanding | Video-MME | Overall Score84.3 | 48 | |
| Video Question Answering | Video-MME without subtitles | Accuracy (Overall)73.3 | 46 | |
| Long Video Understanding | Video-MME long 1.0 | Accuracy (No Subs)67.4 | 45 | |
| Video Multimodal Understanding | Video MME | Score61.9 | 43 | |
| Long Video QA | Video-MME | Average Score84.3 | 41 | |
| Video Question Answering | Video-MME Long | Accuracy82 | 41 | |
| Multi-modal Video Evaluation | Video-MME | Accuracy75 | 38 | |
| Video Understanding | Video-MME | Accuracy84.8 | 36 | |
| Video Question Answering | Video-MME Long Duration 1.0 | Accuracy (w/o subtitles)67.4 | 34 | |
| Video Understanding | Video-MME w/o sub. | Accuracy76.6 | 33 | |
| Video Understanding | Video-MME Long | Accuracy (Long, wo Sub)67.4 | 32 | |
| Long Video Understanding | Video MME w/o sub (long) | Accuracy71.4 | 30 | |
| Long Video Question Answering | Video-MME | Accuracy73.2 | 30 | |
| Long-form Video Multimodal Evaluation | Video-MME Long | Video-MME-Long Score56.6 | 24 |