Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AVE

Benchmarks

Task NameDataset NameSOTA ResultTrend
Audio-visual event localizationAVE (test)
Accuracy83.5
37
Audio-Visual Event LocalizationAVE
Accuracy81.1
35
Audio-visual event recognitionAVE (test)
AV Accuracy71.64
20
Multimodal ClassificationAVE (test)
Multi Acc65.1
14
Multimodal ClassificationAVE
Accuracy (%)73.82
12
Video Saliency PredictionAVE (test)
AUC-J88.53
7
Continual audio-visual sound separationAVE
SDR3.55
6
Direction PredictionAVE (test)
Accuracy (10-class)38.9
6
Audio-Visual ClassificationAVE
AV Score71.64
6
Emergent modality binding (vi -> te -> au)AVE (test)
mAP18.1
5
Emergent modality binding (au -> te -> vi)AVE (test)
mAP0.168
5
Image-to-Audio RetrievalAVE
mAP4.13
4
Audio-to-Image RetrievalAVE
mAP4.11
4
Audio localization from visual segment queryAVE
V2A35.8
4
Audio-Visual Event ClassificationAVE
Accuracy0.934
4
Audio-Image RetrievalAVE (test)
mAP4.46
4
Text-to-Audio RetrievalAVE
Accuracy28.7
3
Audio-to-Text RetrievalAVE
Accuracy33.1
3
Supervised Event LocalizationAVE
Audio-only Accuracy82.3
3
Audio Source SeparationAVE
Human Preference Score68
1
Showing 20 of 20 rows