Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Clotho

Benchmarks

Task NameDataset NameSOTA ResultTrend
Audio-to-Text RetrievalClotho (test)
R@138.6
85
Text-to-Audio RetrievalClotho (test)
R@128.3
69
Audio CaptioningClotho
CIDEr50.9
60
Audio CaptioningClotho 2.1 (test)
CIDEr0.496
31
Cross-modal retrievalClotho (test)
R@146.4
29
Audio RetrievalClotho
R@123.7
28
Audio CaptioningClotho (test)
METEOR19.7
21
Audio Question and AnsweringClothoAQA
Accuracy85.6
20
Text-to-Audio GenerationClotho (test)
FID17.23
17
Text-to-Audio RetrievalClotho T→A
Recall@124
15
Text-to-Audio RetrievalClotho V1
R@125.3
15
Audio Hallucination EvaluationClotho-1K
HR16.98
14
Audio UnderstandingClothoAQA
Accuracy75.16
14
Text-to-audio RetrievalClotho V2 (test)
R@14.61
13
Audio-to-text RetrievalClotho V2 (test)
Recall@118.78
13
Text-to-Audio RetrievalClotho V2
R@1 (%)27.2
13
Automated Audio CaptioningClotho
AAC Score55.92
12
Automated Audio CaptioningClotho 2.1 (evaluation)
SPIDEr33.4
12
Audio CaptioningClotho V2
CIDEr52
11
Audio temporal groundingClotho-Moment
R@0.393.6
10
Automated Audio CaptioningClotho (evaluation)
SPIDEr33.2
10
Text-to-Audio RetrievalClotho 1K 1.0 (test)
R@126.9
10
Text-to-Audio RetrievalClotho
R@10.168
10
Audio CaptioningClotho (eval)
SPIDEr31.88
9
Audio-to-Text RetrievalClotho 1K 1.0 (test)
R@127.1
8
Showing 25 of 46 rows