MSR-VTT

Benchmarks

Task Name	Dataset Name	SOTA Result
Text-to-Video Retrieval	MSR-VTT	Recall@164.4	406
Text-to-Video Retrieval	MSR-VTT (test)	R@1990	271
Video-to-Text Retrieval	MSR-VTT	Recall@164.8	221
Text-to-Video Retrieval	MSR-VTT (1k-A)	R@1090.6	211
Video Captioning	MSR-VTT (test)	CIDEr104.2	142
Text-to-Video Generation	MSR-VTT (test)	CLIP Similarity0.3123	85
Video-to-Text Retrieval	MSR-VTT (1k-A)	Recall@584.1	74
Text-to-Video Retrieval	MSR-VTT 1K (test)	R@193.61	65
Text-to-Video Retrieval	MSR-VTT 1k-A (test)	R@148.5	57
Text-to-Video Retrieval	MSR-VTT (9K)	R@152	55
Text-to-Video Retrieval	MSR-VTT (Full)	R@134.3	55
Video-to-Text Retrieval	MSR-VTT 9K	R@147.7	43
Video Question Answering	MSR-VTT	Accuracy94.4	42
Video-to-Text Retrieval	MSR-VTT 1K (test)	R@153.7	39
Text-to-Video Retrieval	MSR-VTT 1K (val)	R@153.3	38
Video-to-Text Retrieval	MSR-VTT (Full)	Recall@164.7	38
Video Retrieval	MSR-VTT	R@157.7	34
Video-Text Retrieval	MSR-VTT	R@178.6	34
Text-to-Video Generation	MSR-VTT	CLIPSIM0.3204	28
Text-to-Video Retrieval	MSR-VTT 7K	Recall@1082.8	27
Text-to-Video Generation	MSR-VTT zero-shot	FVD212	26
Text-to-Video Retrieval	MSR-VTT 1K videos (test)	Recall@1075.1	25
Text-to-Video Retrieval	MSR-VTT Official full-size (test)	R@148.8	24
Cross-modal Retrieval	MSR-VTT (test)	R@5 (V→T)64.9	23
Cross-modal retrieval (Audio)	MSR-VTT	R@142	22

Showing 25 of 109 rows