Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AudioCaps

Benchmarks

Task NameDataset NameSOTA ResultTrend
Text-to-audio generationAudioCaps (test)
FAD0.77
154
text-to-audio retrievalAudioCaps (test)
Recall@166.59
152
Audio CaptioningAudioCaps (test)
CIDEr91.1
140
Audio-to-text RetrievalAudioCaps (test)
R@165.6
69
Audio RetrievalAudioCaps
R@152
50
Audio CaptioningAudioCaps
CIDEr80.3
49
Text-to-audio retrievalAudioCaps
Recall@155.2
35
Cross-modal retrievalAudioCaps (test)
R@159.1
23
Zero-shot Retrieval (T+V → A)AudioCaps
Recall@195.2
14
Zero-shot Retrieval (T+A → V)AudioCaps
Recall@189
14
Zero-shot Retrieval (T → A+V)AudioCaps
Recall@145.8
14
Audio EditingAudioCaps
R-MOS4.43
12
Audio Question AnsweringAudioCaps-QA (test)
Model-as-Judge Score60.77
12
Text-to-audio generationAudioCaps (evaluation)
FAD1.85
11
Video-to-Audio RetrievalAudioCaps V→A
Recall@188.3
10
Text-to-Video RetrievalAudioCaps T→V
Recall@120.8
10
Text-to-Audio RetrievalAudioCaps 1K 1.0 (test)
Recall@152
10
Audio CaptioningAudioCaps AudioSet (test)
SPIDEr48.5
10
Automated Audio CaptioningAudioCaps (evaluation)
SPIDEr51.8
9
Audio SteganographyAudioCaps
BER (Original)0.09
8
Neural Audio CompressionAudioCaps (test)
FAD96.926
8
Audio-to-Text RetrievalAudioCaps 1K 1.0 (test)
R@152.4
8
Audio-text alignment correlationAudioCaps (test)
SRCC0.457
7
RetrievalAudioCaps T+V -> A
Recall@195.2
6
Sound ReconstructionAudioCaps (val)
VISQOL Score3.2
6
Showing 25 of 49 rows