Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Audio Question Answering on ClothoAQA (test)
Loading...
71.02
Accuracy
VideoLLaMA2.1-AV
12.5304
27.7152
42.9
58.0848
Mar 22, 2024
Jul 7, 2024
Oct 23, 2024
Feb 8, 2025
May 26, 2025
Sep 11, 2025
Dec 28, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
VideoLLaMA2.1-AV
Size=7B, Training Hour...
2024.06
71.02
VideoLLaMA2-AV
Size=7B, Training Hour...
2024.06
70.11
Qwen2.5-Omni
Model Size=7-8B, Zero-...
2025.12
68
JavisGPT
Model Size=7-8B, #Samp...
2025.12
67.3
VideoLLaMA2.1
Model Size=7-8B, #Samp...
2025.12
66.3
VideoLLaMA2
Model Size=7-8B, #Samp...
2025.12
65.1
Qwen2-Audio
Model Size=7-8B, Zero-...
2025.12
60.9
Qwen-Audio
Size=7B, Training Hour...
2024.06
57.9
Qwen-Audio
Model Size=7-8B, Zero-...
2025.12
57.9
UnifiedIO-2
Model Size=7-8B, #Samp...
2025.12
31.4
NExT-GPT
Model Size=7-8B, #Samp...
2025.12
30.9
InternVideo2_S2
Backbone=InternVideo2_...
2024.03
30.14
MWAFM
Backbone=MWAFM, Finetu...
2024.03
22.24
AquaNet
Backbone=AquaNet, Fine...
2024.03
14.78
Feedback
Search any
task
Search any
task