Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AudioCaps

Benchmarks

Task NameDataset NameSOTA ResultTrend
text-to-audio retrievalAudioCaps (test)
Recall@166.59
145
Audio CaptioningAudioCaps (test)
CIDEr91.1
140
Text-to-audio generationAudioCaps (test)
FAD0.77
138
Audio-to-text RetrievalAudioCaps (test)
R@165.6
62
Audio CaptioningAudioCaps
CIDEr80.3
47
Audio RetrievalAudioCaps
R@152
42
Cross-modal retrievalAudioCaps (test)
R@159.1
23
Text-to-audio retrievalAudioCaps
Recall@155.2
19
Zero-shot Retrieval (T+V → A)AudioCaps
Recall@195.2
14
Zero-shot Retrieval (T+A → V)AudioCaps
Recall@189
14
Zero-shot Retrieval (T → A+V)AudioCaps
Recall@145.8
14
Audio EditingAudioCaps
R-MOS4.43
12
Audio Question AnsweringAudioCaps-QA (test)
Model-as-Judge Score60.77
12
Video-to-Audio RetrievalAudioCaps V→A
Recall@188.3
10
Text-to-Video RetrievalAudioCaps T→V
Recall@120.8
10
Text-to-Audio RetrievalAudioCaps 1K 1.0 (test)
Recall@152
10
Audio CaptioningAudioCaps AudioSet (test)
SPIDEr48.5
10
Automated Audio CaptioningAudioCaps (evaluation)
SPIDEr51.8
9
Neural Audio CompressionAudioCaps (test)
FAD96.926
8
Audio-to-Text RetrievalAudioCaps 1K 1.0 (test)
R@152.4
8
RetrievalAudioCaps T+V -> A
Recall@195.2
6
Sound ReconstructionAudioCaps (val)
VISQOL Score3.2
6
Text-to-AudioAudioCaps multi-event prompts
FDopenl375.2
5
Text-to-audio infillingAudioCaps (test)
IS (Inception Score)13.28
5
Audio-Text RetrievalAudioCaps (val)
mAP21.98
5
Showing 25 of 43 rows