| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Video Understanding | MVBench | Accuracy100 | 563 | |
| Video Understanding | MVBench (test) | Accuracy100.3 | 190 | |
| Video Question Answering | MVBench | Accuracy81.3 | 90 | |
| Multi-modal Video Understanding | MVBench | Accuracy82.9 | 83 | |
| Multi-choice Video Question Answering | MVBench | Avg Accuracy67.6 | 73 | |
| Video Question Answering | MVBench | Accuracy77.1 | 69 | |
| Video Question Answering | MVBench (test) | Accuracy73.6 | 45 | |
| Video Reasoning | MVBench | MVBench Score64.7 | 39 | |
| Adversarial Attack | MVBench | ASR83.84 | 37 | |
| Video Understanding | MVBench | Prefilling FLOPs (T)6.9 | 35 | |
| Short video understanding | MVBench | Accuracy76.4 | 28 | |
| Video Understanding | MVBench zero-shot | Accuracy69.7 | 25 | |
| General Video Understanding | MVBench Overall | Accuracy86.03 | 25 | |
| Video Question Answering | MVBench 1.0 (test) | AS Score77 | 25 | |
| Fine-grained Video Understanding | MVBench | Accuracy73.8 | 22 | |
| Video Question Answering | MVBench | Average Score61.2 | 20 | |
| Video understanding | MVBench (val) | Accuracy54.9 | 20 | |
| Multiple-choice question answering | MVBench | Accuracy58.54 | 15 | |
| Exocentric Video Understanding | MVBench | Score73.8 | 13 | |
| General spatiotemporal perception | MVbench | Score49 | 11 | |
| Video Question Answering | MVBench | Accuracy72 | 10 | |
| Multi-Choice Q&A | MVBench (val) | Accuracy72.2 | 9 | |
| Speculative Decoding | MVBench | Tau (τ)3.87 | 8 | |
| Video Question Answering | MVBench 2024 (test) | Accuracy64.6 | 8 | |
| Video Understanding | MVBench 64 frame | MVBench Score70.4 | 8 |