Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MSR-VTT

Benchmarks

Task NameDataset NameSOTA ResultTrend
Text-to-Video RetrievalMSR-VTT
Recall@164.3
313
Text-to-Video RetrievalMSR-VTT (test)
R@1990
234
Text-to-Video RetrievalMSR-VTT (1k-A)
R@1090.6
211
Video-to-Text RetrievalMSR-VTT
Recall@164.8
157
Video CaptioningMSR-VTT (test)
CIDEr104.2
121
Text-to-Video GenerationMSR-VTT (test)
CLIP Similarity0.3123
85
Video-to-Text RetrievalMSR-VTT (1k-A)
Recall@584.1
74
Text-to-Video RetrievalMSR-VTT 1k-A (test)
R@148.5
57
Text-to-Video RetrievalMSR-VTT (9K)
R@152
55
Text-to-Video RetrievalMSR-VTT (Full)
R@134.3
55
Text-to-Video RetrievalMSR-VTT 1K (test)
R@155.9
45
Video-to-Text RetrievalMSR-VTT 9K
R@147.7
43
Video Question AnsweringMSR-VTT
Accuracy94.4
42
Video-to-Text RetrievalMSR-VTT 1K (test)
R@153.7
39
Text-to-Video RetrievalMSR-VTT 1K (val)
R@153.3
38
Video-to-Text RetrievalMSR-VTT (Full)
Recall@164.7
38
Text-to-Video GenerationMSR-VTT
CLIPSIM0.3204
28
Text-to-Video RetrievalMSR-VTT 7K
Recall@1082.8
27
Text-to-Video RetrievalMSR-VTT 1K videos (test)
Recall@1075.1
25
Text-to-Video RetrievalMSR-VTT Official full-size (test)
R@148.8
24
Cross-modal retrieval (Audio)MSR-VTT
R@142
22
Text-to-Video GenerationMSR-VTT zero-shot
CLIPSIM32.04
20
Video RetrievalMSR-VTT
R@157.7
19
Audio-to-Visual RetrievalMSR-VTT (test)
R@1150
18
Text-to-Video RetrievalMSR-VTT 1k-Yu (test)
R@132.4
18
Showing 25 of 98 rows