Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

VALOR

Benchmarks

Task NameDataset NameSOTA ResultTrend
Text-to-Audio RetrievalVALOR
Recall@136.4
24
Text-to-Video RetrievalVALOR-32K
Recall@180
18
Zero-shot Retrieval (T+V → A)VALOR
Recall@178.8
14
Zero-shot Retrieval (T+A → V)VALOR
Recall@193
14
Zero-shot Retrieval (T → A+V)VALOR
Recall@176.9
14
Audio-Visual Question AnsweringVALOR (test)
M.J. Score44.67
12
Audio-to-Text RetrievalVALOR
Recall@135.1
9
CaptioningVALOR 32K
CIDEr62.8
9
Text-to-audiovisual RetrievalVALOR-32K (test)
Recall@180.9
7
Audio-Visual CaptioningVALOR 32K (val)
BLEU@416.88
7
RetrievalVALOR T+V -> A
Recall@178.8
6
RetrievalVALOR T+A -> V
Recall@193
6
RetrievalVALOR T -> A+V
Recall@176.8
6
Audiovisual CaptioningVALOR-32K
B@49.6
5
Audio-Visual Question AnsweringVALOR (test)
CIDEr62.2
5
Audio-visual captioningVALOR-32K (test)
CIDEr62.2
4
Audio-Visual Question AnsweringVALOR
M.J. Score56.53
3
Text-to-Video-Audio RetrievalVALOR-32K
Recall@178.7
2
Showing 18 of 18 rows