VALOR

Benchmarks

Task Name	Dataset Name	SOTA Result
Text-to-Audio Retrieval	VALOR	Recall@136.4	24
Text-to-Video Retrieval	VALOR-32K	Recall@180	18
Zero-shot Retrieval (T+V → A)	VALOR	Recall@178.8	14
Zero-shot Retrieval (T+A → V)	VALOR	Recall@193	14
Zero-shot Retrieval (T → A+V)	VALOR	Recall@176.9	14
Audio-Visual Question Answering	VALOR (test)	M.J. Score44.67	12
Audio-to-Text Retrieval	VALOR	Recall@135.1	9
Captioning	VALOR 32K	CIDEr62.8	9
Text-to-audiovisual Retrieval	VALOR-32K (test)	Recall@180.9	7
Audio-Visual Captioning	VALOR 32K (val)	BLEU@416.88	7
Retrieval	VALOR T+V -> A	Recall@178.8	6
Retrieval	VALOR T+A -> V	Recall@193	6
Retrieval	VALOR T -> A+V	Recall@176.8	6
Audiovisual Captioning	VALOR-32K	B@49.6	5
Audio-Visual Question Answering	VALOR (test)	CIDEr62.2	5
Audio-visual captioning	VALOR-32K (test)	CIDEr62.2	4
Audio-Visual Question Answering	VALOR	M.J. Score56.53	3
Text-to-Video-Audio Retrieval	VALOR-32K	Recall@178.7	2

Showing 18 of 18 rows