| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Video Understanding | MVBench | Accuracy90 | 247 | |
| Video Understanding | MVBench (test) | Accuracy100.3 | 97 | |
| Video Question Answering | MVBench | Accuracy81.3 | 90 | |
| Multi-choice Video Question Answering | MVBench | Avg Accuracy67.6 | 73 | |
| Multi-modal Video Understanding | MVBench | Score68 | 39 | |
| Video Question Answering | MVBench (test) | Accuracy73.6 | 38 | |
| Adversarial Attack | MVBench | ASR83.84 | 37 | |
| Video Question Answering | MVBench 1.0 (test) | AS Score77 | 25 | |
| Video Reasoning | MVBench | MVBench Score64.7 | 24 | |
| Fine-grained Video Understanding | MVBench | Accuracy73.8 | 22 | |
| Video Question Answering | MVBench | Accuracy69.9 | 21 | |
| Video understanding | MVBench (val) | Accuracy54.9 | 20 | |
| General spatiotemporal perception | MVbench | Score49 | 11 | |
| Video Question Answering | MVBench | Accuracy72 | 10 | |
| General Video Understanding | MVBench Overall | Accuracy86.03 | 9 | |
| Speculative Decoding | MVBench | Tau (τ)3.87 | 8 | |
| Video Question Answering | MVBench 2024 (test) | Accuracy64.6 | 8 | |
| Video Understanding | MVBench 64 frame | MVBench Score70.4 | 8 | |
| Short video understanding | MVBench | Accuracy76.4 | 8 | |
| Visual Understanding | MVBench (full) | Score71.2 | 6 | |
| Video Description | MVBench | Average Accepted Length4.2 | 5 | |
| Video Question Answering | MVBench | SC Score52 | 4 | |
| Video Understanding | MVBench | Pearson Correlation Coefficient (r)-0.178 | 1 | |
| Video Question Answering | MVBench | Scene Transition80 | 1 |