Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Clotho

Benchmarks

Task NameDataset NameSOTA ResultTrend
Audio-to-Text RetrievalClotho (test)
R@138.6
85
Audio CaptioningClotho
CIDEr50.9
82
Text-to-Audio RetrievalClotho (test)
R@128.3
78
Audio-to-Text RetrievalClotho
R@126.5
49
Audio CaptioningClotho (test)
METEOR19.7
43
Audio RetrievalClotho
R@123.7
33
Text-to-Audio RetrievalClotho
R@10.212
31
Audio CaptioningClotho 2.1 (test)
CIDEr0.496
31
Cross-modal retrievalClotho (test)
R@146.4
29
Audio Question and AnsweringClothoAQA
Accuracy85.6
20
Text-to-Audio GenerationClotho (test)
FID17.23
17
Text-to-Audio RetrievalClotho T→A
Recall@124
15
Text-to-Audio RetrievalClotho V1
R@125.3
15
Audio Hallucination EvaluationClotho-1K
HR16.98
14
Audio UnderstandingClothoAQA
Accuracy75.16
14
Text-to-text retrievalClotho
R@164.52
13
Text-to-Audio RetrievalClotho (evaluation)
R@122.87
13
Text-to-audio RetrievalClotho V2 (test)
R@14.61
13
Audio-to-text RetrievalClotho V2 (test)
Recall@118.78
13
Text-to-Audio RetrievalClotho V2
R@1 (%)27.2
13
Automated Audio CaptioningClotho
AAC Score55.92
12
Automated Audio CaptioningClotho 2.1 (evaluation)
SPIDEr33.4
12
Audio Question AnsweringClotho (test)
Token-Level Accuracy52.8
11
Audio CaptioningClotho V2
CIDEr52
11
Watermark DetectionClotho 1.0 (test)
Perth100
10
Showing 25 of 53 rows