Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MSR-VTT

Benchmarks

Task NameDataset NameSOTA ResultTrend
Text-to-Video RetrievalMSR-VTT
Recall@164.4
369
Text-to-Video RetrievalMSR-VTT (test)
R@1990
255
Text-to-Video RetrievalMSR-VTT (1k-A)
R@1090.6
211
Video-to-Text RetrievalMSR-VTT
Recall@164.8
185
Video CaptioningMSR-VTT (test)
CIDEr104.2
128
Text-to-Video GenerationMSR-VTT (test)
CLIP Similarity0.3123
85
Video-to-Text RetrievalMSR-VTT (1k-A)
Recall@584.1
74
Text-to-Video RetrievalMSR-VTT 1K (test)
R@193.61
65
Text-to-Video RetrievalMSR-VTT 1k-A (test)
R@148.5
57
Text-to-Video RetrievalMSR-VTT (9K)
R@152
55
Text-to-Video RetrievalMSR-VTT (Full)
R@134.3
55
Video-to-Text RetrievalMSR-VTT 9K
R@147.7
43
Video Question AnsweringMSR-VTT
Accuracy94.4
42
Video-to-Text RetrievalMSR-VTT 1K (test)
R@153.7
39
Text-to-Video RetrievalMSR-VTT 1K (val)
R@153.3
38
Video-to-Text RetrievalMSR-VTT (Full)
Recall@164.7
38
Video RetrievalMSR-VTT
R@157.7
31
Text-to-Video GenerationMSR-VTT
CLIPSIM0.3204
28
Text-to-Video RetrievalMSR-VTT 7K
Recall@1082.8
27
Text-to-Video GenerationMSR-VTT zero-shot
FVD212
26
Text-to-Video RetrievalMSR-VTT 1K videos (test)
Recall@1075.1
25
Text-to-Video RetrievalMSR-VTT Official full-size (test)
R@148.8
24
Cross-modal retrieval (Audio)MSR-VTT
R@142
22
Video-Text RetrievalMSR-VTT
Recall (Text-to-Video)42.8
22
Cross-modal RetrievalMSR-VTT (test)
R@1 (V→T)37.3
19
Showing 25 of 105 rows