| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Video Understanding | MVBench | Accuracy100 | 425 | |
| Video Understanding | MVBench (test) | Accuracy100.3 | 151 | |
| Video Question Answering | MVBench | Accuracy81.3 | 90 | |
| Multi-choice Video Question Answering | MVBench | Avg Accuracy67.6 | 73 | |
| Multi-modal Video Understanding | MVBench | Score71.2 | 70 | |
| Video Question Answering | MVBench (test) | Accuracy73.6 | 45 | |
| Video Question Answering | MVBench | Accuracy73.2 | 42 | |
| Adversarial Attack | MVBench | ASR83.84 | 37 | |
| Short video understanding | MVBench | Accuracy76.4 | 28 | |
| General Video Understanding | MVBench Overall | Accuracy86.03 | 25 | |
| Video Question Answering | MVBench 1.0 (test) | AS Score77 | 25 | |
| Video Reasoning | MVBench | MVBench Score64.7 | 24 | |
| Fine-grained Video Understanding | MVBench | Accuracy73.8 | 22 | |
| Video understanding | MVBench (val) | Accuracy54.9 | 20 | |
| Exocentric Video Understanding | MVBench | Score73.8 | 13 | |
| General spatiotemporal perception | MVbench | Score49 | 11 | |
| Video Question Answering | MVBench | Accuracy72 | 10 | |
| Multi-Choice Q&A | MVBench (val) | Accuracy72.2 | 9 | |
| Speculative Decoding | MVBench | Tau (τ)3.87 | 8 | |
| Video Question Answering | MVBench 2024 (test) | Accuracy64.6 | 8 | |
| Video Understanding | MVBench 64 frame | MVBench Score70.4 | 8 | |
| Multi-modal Video Understanding | MVBench (test) | MVBench Score60.4 | 7 | |
| Visual Understanding | MVBench (full) | Score71.2 | 6 | |
| Video Understanding | MVBench 19 | Overall Score70.73 | 5 | |
| Video Description | MVBench | Average Accepted Length4.2 | 5 |