| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Video-MMMU | Accuracy84.6 | 45 | 11d ago | ||
| Video-MME | OmniJigsaw (CMM) | Overall Performance73.1 | 39 | 8d ago | |
| Video-Holmes | VideoChat-M1 | Accuracy60.5 | 37 | 8d ago | |
| Video-Holmes | Score46.7 | 34 | 18d ago | ||
| SAGE-Bench 1.0 (test) | SAGE-Flash | Overall Score73.4 | 29 | 1mo ago | |
| Seed-Bench R1 | APPO | Average Answer Score50.5 | 26 | 1mo ago | |
| LVBench | Triage | LVBench Score43.3 | 24 | 1mo ago | |
| MVBench | Triage | MVBench Score64.7 | 24 | 1mo ago | |
| LongVideoBench | Triage | LongVideoBench Score59 | 24 | 1mo ago | |
| Video-Holmes | VideoSeek | SR56.1 | 22 | 24d ago | |
| STAR | BoxTuning | Score67.7 | 19 | 4d ago | |
| MMMU Video | Accuracy84.6 | 16 | 1mo ago | ||
| SEED-Bench L3 OOD R1 | APPO | Accuracy49.3 | 16 | 1mo ago | |
| SEED-Bench L2 OOD R1 | APPO | Accuracy51.6 | 16 | 1mo ago | |
| SEED-Bench-R1 L1 In-Dist. | APPO | Accuracy50.5 | 16 | 1mo ago | |
| MMVU mc | Score82.6 | 16 | 1mo ago | ||
| MLVU (test) | OmniJigsaw (CMM) | Accuracy62.75 | 15 | 8d ago | |
| TUNA-Bench | OmniJigsaw (CMM) | Accuracy66.2 | 15 | 8d ago | |
| AoT Bench | OmniJigsaw (CMM) | Accuracy68.9 | 15 | 8d ago | |
| VidHalluc (test) | GPT-4o | Binary QA Accuracy (ACH)81.15 | 13 | 1mo ago | |
| Video-R1 | AT-RL | VSI44.3 | 12 | 1mo ago | |
| VBVR-Bench Out-of-Domain | Average Score0.988 | 11 | 1mo ago | ||
| VBVR-Bench In-Domain | Average Score96 | 11 | 1mo ago | ||
| VBVR-Bench | Overall Accuracy97.4 | 11 | 1mo ago | ||
| VSIBench | RLER | Accuracy43.3 | 10 | 11d ago |