| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Video Understanding | MLVU | Score78.19 | 221 | |
| Long Video Understanding | MLVU | Score79.8 | 154 | |
| Video Question-Answering | MLVU | Accuracy76.2 | 143 | |
| Video Understanding | MLVU | Accuracy87.34 | 80 | |
| Long Video Understanding | MLVU (dev) | Score78.1 | 63 | |
| Long Video Understanding | MLVU (test) | Average Score81 | 60 | |
| Video Understanding | MLVU 3-120min (test) | Accuracy47.7 | 49 | |
| Video Understanding | MLVU 3-120min (dev) | Accuracy63 | 49 | |
| Video Question Answering | MLVU 78 (test) | Accuracy76.66 | 45 | |
| Multi-discipline Long Video Understanding | MLVU | Score68.9 | 44 | |
| Video Question Answering | MLVU | M-Avg Score72.4 | 40 | |
| Long-video Question Answering | MLVU | M-Avg79.5 | 39 | |
| Video Understanding | MLVU (test) | Average100.3 | 34 | |
| Long Video Understanding | MLVU 3-120 min | Accuracy82.1 | 23 | |
| Long Video Understanding | MLVU MCQ (test) | Accuracy81.5 | 22 | |
| Long Video Understanding | MLVU multiple-choice task | Overall Accuracy73.4 | 21 | |
| Video Understanding | MLVU MCQ (test) | Accuracy81.5 | 21 | |
| Video Understanding Reasoning | MLVU | Accuracy73.46 | 21 | |
| Video Understanding | MLVU | Accuracy64.7 | 20 | |
| Video Question Answering | MLVU (dev) | Accuracy74.5 | 19 | |
| Long Video Understanding | MLVU (651s) | Accuracy78.1 | 18 | |
| Video Understanding | MLVU | Base Accuracy68.4 | 18 | |
| Video Understanding | MLVU | Accuracy71.4 | 17 | |
| Video Question Answering | MLVU MCQ | Accuracy82.1 | 17 | |
| Video Understanding | MLVU (dev) | MLVU Dev Score68.4 | 17 |