Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Audio-Visual Understanding on AVUT AV-Human
Loading...
0.7834
Accuracy
Gemini 1.5 Pro
0.232616
0.375608
0.5186
0.661592
May 30, 2026
Accuracy
Updated 1d ago
Evaluation Results
Method
Method
Links
Accuracy
Gemini 1.5 Pro
Modality=Audio-visual...
2026.05
0.7834
Qwen2-VL-7B
Modality=Visual MLLMs
2026.05
0.5838
GPT-4o
Modality=Visual MLLMs
2026.05
0.5662
LLaVA-Video-7B
Modality=Visual MLLMs
2026.05
0.5652
V-LynX-0.5B
Modality=Audio-visual...
2026.05
0.4691
InternVL2-8B
Modality=Visual MLLMs
2026.05
0.459
VideoLLaMA2-7B
Modality=Audio-visual...
2026.05
0.449
VILA-1.5-8B
Modality=Visual MLLMs
2026.05
0.4448
video-SALMONN-13B
Modality=Audio-visual...
2026.05
0.3833
SALMONN-13B
Modality=Audio MLLMs
2026.05
0.3648
VideoLLaVA-7B
Modality=Visual MLLMs
2026.05
0.3314
PandaGPT-13B
Modality=Audio-visual...
2026.05
0.2538
Feedback
Search any
task
Search any
task