Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Audio-Visual Understanding on Video-MME w/ audio
Loading...
68.4
Accuracy
Ola
51.24
55.695
60.15
64.605
Dec 11, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Ola
Model Size=7B, Est. To...
2025.12
68.4
Qwen2.5-Omni-7B
Model Size=7B, Est. To...
2025.12
66.3
EchoingPixels
Model Size=7B, Token B...
2025.12
64.1
Qwen2.5-Omni-3B
Model Size=3B, Est. To...
2025.12
63.1
EchoingPixels
Model Size=3B, Token B...
2025.12
60.7
EchoingPixels
Model Size=3B, Token B...
2025.12
58.4
IntraModal
Model Size=7B, Token B...
2025.12
56.5
EchoingPixels
Model Size=3B, Token B...
2025.12
55.7
IntraModal
Model Size=3B, Token B...
2025.12
52.1
FastV
Model Size=3B, Token B...
2025.12
51.9
Feedback
Search any
task
Search any
task