Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

VGGSound

Benchmarks

Task NameDataset NameSOTA ResultTrend
Video-to-Audio GenerationVGGSound (test)
FAD0.52
95
Single-source sound localizationVGGSound single-source (test)
IoU@0.560.2
39
Audio-visual Zero-Shot ClassificationVGGSound GZSL (test)
S Score29.96
38
Multi-sound source localizationVGGSound-Duet (test)
CIoU@0.377.6
37
Audio-visual ClassificationVGGSound
Top-1 Acc69.8
37
Video ClassificationVGGSound-C unimodal (test)
Accuracy (Gaussian)53.14
25
ClassificationVGGSound-C (test)
Error Rate (Gauss.)6.2
24
Audio-Visual Event ClassificationVGGSound (test)
Fusion Top-1 Acc69.1
23
Video-to-audio generationVGGSound
FD_VGG0.97
22
Multimodal Event ClassificationVGGSound-C severity level 5 (test)
Gauss. Corruption Accuracy54.9
20
Video-to-AudioVGGSound (test)
FD (PaSST)47.38
20
Multimodal RetrievalVGGSound-S (test)
Recall@1 (Video -> Text)6.8
19
Event Classification (A → V)VGGSound-AVEL 90K
Precision67
15
Event Classification (V → A)VGGSound-AVEL 40K
Precision75.3
15
Video RetrievalVGGSound
R@133.5
15
Audio-Visual CaptioningVGGSound Animal
Cs Score51.52
14
Video ClassificationVGGSound-C severity level 5
Accuracy (Gaussian Blur)54.7
14
Zero-shot Classification (A+V → T)VGGSound
Zero-shot Accuracy52.7
14
Audio-visual RecognitionVGGSound GZSL
S Score48.33
14
Task-wise classification accuracyVGGSound-2C bimodal (test)
Accuracy (Gaussian)43.74
14
Audio-to-Video RetrievalVGGSound (test)
Recall@134.9
13
Multi-source sound localizationVGGSound Instruments (test)
CIoU@0.189.6
13
Single-source sound localizationVGGSound Instruments (test)
IoU@0.369.5
13
Audio-visual classificationVGGSound Music
Top-1 Accuracy71.57
12
Event Localization (A → V)VGGSound AVEL 90K
Segment-level Accuracy70.4
11
Showing 25 of 86 rows