Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DiDeMo

Benchmarks

Task NameDataset NameSOTA ResultTrend
Text-to-Video RetrievalDiDeMo
R@132.4
465
Text-to-Video RetrievalDiDeMo (test)
R@170.5
407
Video-to-Text RetrievalDiDeMo
R@171.9
136
Video-to-Text RetrievalDiDeMo (test)
R@167.5
111
Text-to-Video RetrievalDiDeMo (DDM) zero-shot
R@157
36
Text-to-Video RetrievalDiDeMo (DDM) full (test val)
Recall@146.3
34
Text-to-Video RetrievalDiDeMo 1K videos (test)
R@166.63
21
RetrievalDiDeMo T+A -> V
Recall@182.1
20
Average RetrievalDiDeMo (test)
R@119.2
19
Audio-to-Text RetrievalDiDeMo (test)
R@15.3
19
Text-to-Audio RetrievalDiDeMo (test)
R@15.6
19
Audio-to-Video RetrievalDiDeMo (test)
R@119.5
19
Video-to-Audio RetrievalDiDeMo (test)
R@120.7
19
Video RetrievalDiDeMo
R@146.1
18
Video-Text RetrievalDIDEMO
GFLOPS44.5
18
Text-to-video retrievalDiDeMo (UTD-split)
Recall@135.6
17
Video-to-text retrievalDiDeMo
R@1 (Gaussian)20.32
14
Moment RetrievalDiDeMo (test)
R@1 (IoU=0.3)46.3
14
Zero-shot Retrieval (T+V → A)DiDeMo
Recall@10.695
14
Zero-shot Retrieval (T → A+V)DiDeMo
Recall@153.7
14
Video Temporal GroundingDiDeMo (test)
Recall@1 (IoU=0.3)69.2
11
Text-to-video retrievalDiDeMo 28s (test)
R@138.1
11
Video Corpus Moment Retrieval (VCMR)DiDeMo 14 (test)
Recall@1 (IoU=0.5)2.26
11
Text-to-Video RetrievalDiDeMo 12 (full-corpus)
R@126
8
Text-to-Video RetrievalDiDeMo 12 (test)
R@145.3
8
Showing 25 of 39 rows