| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Video-MMMU | Accuracy84.6 | 32 | 4d ago | ||
| SAGE-Bench 1.0 (test) | SAGE-Flash | Overall Score73.4 | 29 | 4d ago | |
| LVBench | Triage | LVBench Score43.3 | 24 | 4d ago | |
| MVBench | Triage | MVBench Score64.7 | 24 | 4d ago | |
| LongVideoBench | Triage | LongVideoBench Score59 | 24 | 4d ago | |
| Video-MME | Triage | Short Query Performance72.4 | 24 | 3d ago | |
| Video-Holmes | Score46.7 | 20 | 3d ago | ||
| MMVU mc | Score82.6 | 16 | 3d ago | ||
| Video-Holmes | Qwen3-VL-8B-Thinking + MVP | Accuracy42.6 | 14 | 4d ago | |
| VidHalluc (test) | GPT-4o | Binary QA Accuracy (ACH)81.15 | 13 | 4d ago | |
| Video-R1 | AT-RL | VSI44.3 | 12 | 3d ago | |
| Seed-Bench R1 | Qwen3VL | L1 Answer Score1.95 | 10 | 4d ago | |
| TwiFF-Bench | TwiFF | Instructional CoT2.81 | 10 | 4d ago | |
| MINERVA 600+s (test) | VideoChat-R1.5-7B | Accuracy31.8 | 8 | 4d ago | |
| MINERVA 0-600s (test) | Qwen2.5-VL-7B-Instruct | Accuracy37.8 | 8 | 4d ago | |
| MINERVA overall (test) | VideoChat-R1.5-7B | Accuracy33.8 | 8 | 4d ago | |
| MoVid-Bench Video Expected Comparison 1.0 | Body Accuracy100 | 5 | 4d ago | ||
| EgoSchema (test) | MECOT | Accuracy62.25 | 4 | 4d ago | |
| MMVU | Accuracy75.8 | 3 | 4d ago | ||
| DREAM 1k | VideoLLaMA 2 | F1 Score27.1 | 2 | 4d ago |