Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

VGGSound

Benchmarks

Task NameDataset NameSOTA ResultTrend
Video-to-Audio GenerationVGGSound (test)
FAD0.75
62
Audio-visual Zero-Shot ClassificationVGGSound GZSL (test)
S Score29.96
38
Video ClassificationVGGSound-C unimodal (test)
Accuracy (Gaussian)53.14
25
ClassificationVGGSound-C (test)
Error Rate (Gauss.)6.2
24
Audio-visual ClassificationVGGSound
Top-1 Acc69.8
24
Single-source sound localizationVGGSound single-source (test)
IoU@0.553.7
23
Multi-sound source localizationVGGSound-Duet (test)
CIoU@0.346.9
23
Multimodal Event ClassificationVGGSound-C severity level 5 (test)
Gauss. Corruption Accuracy54.9
20
Audio-Visual Event ClassificationVGGSound (test)
Fusion Top-1 Acc65.8
18
Video RetrievalVGGSound
R@133.5
15
Zero-shot Classification (A+V → T)VGGSound
Zero-shot Accuracy52.7
14
Audio-visual RecognitionVGGSound GZSL
S Score48.33
14
Task-wise classification accuracyVGGSound-2C bimodal (test)
Accuracy (Gaussian)43.74
14
Multi-source sound localizationVGGSound Instruments (test)
CIoU@0.189.6
13
Single-source sound localizationVGGSound Instruments (test)
IoU@0.369.5
13
Audio-visual classificationVGGSound Music
Top-1 Accuracy71.57
12
Text-to-AudioVGGSound-Omni (test)
KL Divergence1.35
10
Cross-modal GenerationVGGSound
Average Score87.23
9
Video-to-AudioVGGSound (test)
APCC-Δ0.758
9
Sound source localizationVGGSound Source
cIoU40.6
9
Sound LocalizationVGGSound Single 1.0 (test)
IoU@0.540.8
9
Sound LocalizationVGGSound-Instruments 1.0 (test)
IoU@0.355.3
9
Multi-source sound localizationVGGSound-Duet
CIoU@0.326.2
9
Multi-source sound localizationVGGSound Instruments
CIoU@0.177.5
9
Zero-shot Classification (A → T)VGGSound
Accuracy47.1
8
Showing 25 of 58 rows