Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long Video Understanding on MLVU (651s)
Loading...
78.1
Accuracy
Qwen3-VL*
31.404
43.527
55.65
67.773
Apr 9, 2026
Accuracy
Updated 9d ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen3-VL*
Size=8B, Tokens per fr...
2026.04
78.1
Tempo*
Size=6B, Tokens per fr...
2026.04
75.6
Tempo*
Size=6B, Tokens per fr...
2026.04
75.2
VideoChat-Flash
Size=7B, Tokens per fr...
2026.04
74.7
VideoLLaMA3*
Size=7B, Tokens per fr...
2026.04
73
Storm
Size=7B, Tokens per fr...
2026.04
72.9
BIMBA
Size=7B, Tokens per fr...
2026.04
71.4
LLaVA-Video
Size=7B, Tokens per fr...
2026.04
70.8
InternVL3.5
Size=8B, Tokens per fr...
2026.04
70.2
Qwen2.5-VL
Size=7B, Tokens per fr...
2026.04
70.2
Qwen3-VL*
Size=2B, Tokens per fr...
2026.04
68.3
LongVU
Size=7B, Tokens per fr...
2026.04
65.4
LLaVA-OneVision
Size=7B, Tokens per fr...
2026.04
64.7
GPT-4o
Size=-, Tokens per fra...
2026.04
64.6
Kangaroo
Size=8B, Tokens per fr...
2026.04
61
LongVA
Size=7B, Tokens per fr...
2026.04
56.3
VideoChat2-HD
Size=7B, Tokens per fr...
2026.04
47.9
LLaMA-VID
Size=7B, Tokens per fr...
2026.04
33.2
Feedback
Search any
task
Search any
task