Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AudioCaps

Benchmarks

Task NameDataset NameSOTA ResultTrend
Text-to-audio generationAudioCaps (test)
KL Divergence0
195
text-to-audio retrievalAudioCaps (test)
Recall@166.59
180
Audio CaptioningAudioCaps (test)
CIDEr91.1
157
Audio-to-text RetrievalAudioCaps (test)
R@165.6
69
Audio CaptioningAudioCaps
CIDEr80.3
66
Text-to-audio retrievalAudioCaps
Recall@155.2
57
Audio RetrievalAudioCaps
R@152
56
Audio EditingAudioCaps
FD (Frechet Distance)12.38
24
Cross-modal retrievalAudioCaps (test)
R@159.1
23
Audio-to-Text RetrievalAudioCaps
R@145.1
22
Zero-shot Retrieval (T+V → A)AudioCaps
Recall@195.2
14
Zero-shot Retrieval (T+A → V)AudioCaps
Recall@189
14
Zero-shot Retrieval (T → A+V)AudioCaps
Recall@145.8
14
Text-to-text retrievalAudioCaps
Recall@150.3
13
Audio EditingAudioCaps
R-MOS4.43
12
Audio Question AnsweringAudioCaps-QA (test)
Model-as-Judge Score60.77
12
environment-aware text-to-speechAudioCaps (test)
WER6.76
11
Audio Question AnsweringAudioCaps (test)
Token-Level Accuracy60.1
11
Audio UnderstandingAudioCaps
LB Score42.82
11
Text-to-audio generationAudioCaps (evaluation)
FAD1.85
11
Text-to-AudioAudioCaps 2019 (test)
FAD1.558
10
Video-to-Audio RetrievalAudioCaps V→A
Recall@188.3
10
Text-to-Video RetrievalAudioCaps T→V
Recall@120.8
10
Text-to-audioAudioCaps
FD (OpenL3)1.86
10
Text-to-Audio RetrievalAudioCaps 1K 1.0 (test)
Recall@152
10
Showing 25 of 58 rows