Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Clotho

Benchmarks

Task NameDataset NameSOTA ResultTrend
Audio-to-Text RetrievalClotho (test)
R@138.6
78
Text-to-Audio RetrievalClotho (test)
R@128.3
62
Audio CaptioningClotho
CIDEr50.9
60
Audio CaptioningClotho 2.1 (test)
CIDEr0.496
31
Cross-modal retrievalClotho (test)
R@146.4
29
Audio CaptioningClotho (test)
METEOR19.7
21
Audio Question and AnsweringClothoAQA
Accuracy85.6
20
Audio RetrievalClotho
R@123.7
20
Text-to-Audio GenerationClotho (test)
FID17.23
17
Text-to-Audio RetrievalClotho T→A
Recall@124
15
Text-to-Audio RetrievalClotho V1
R@125.3
15
Text-to-audio RetrievalClotho V2 (test)
R@14.61
13
Audio-to-text RetrievalClotho V2 (test)
Recall@118.78
13
Text-to-Audio RetrievalClotho V2
R@1 (%)27.2
13
Automated Audio CaptioningClotho 2.1 (evaluation)
SPIDEr33.4
12
Automated Audio CaptioningClotho (evaluation)
SPIDEr33.2
10
Text-to-Audio RetrievalClotho 1K 1.0 (test)
R@126.9
10
Audio CaptioningClotho (eval)
SPIDEr31.88
9
Audio CaptioningClotho V2
CIDEr51.9
9
Audio-to-Text RetrievalClotho 1K 1.0 (test)
R@127.1
8
Audio CaptioningClotho V1
B@418.5
8
Audio UnderstandingClothoAQA
Accuracy75.16
7
Audio Question AnsweringClotho AQA
Score85.6
7
Audio UnderstandingClotho V2
CIDEr25.1
6
Audio Classification/RetrievalClotho
Score0.042
6
Showing 25 of 39 rows