DiDeMo

Benchmarks

Task Name	Dataset Name	SOTA Result
Text-to-Video Retrieval	DiDeMo	R@132.4	472
Text-to-Video Retrieval	DiDeMo (test)	R@170.5	407
Video-to-Text Retrieval	DiDeMo	R@171.9	136
Video-to-Text Retrieval	DiDeMo (test)	R@167.5	111
Text-to-Video Retrieval	DiDeMo (DDM) zero-shot	R@157	36
Text-to-Video Retrieval	DiDeMo (DDM) full (test val)	Recall@146.3	34
Text-to-Video Retrieval	DiDeMo 1K videos (test)	R@166.63	21
Retrieval	DiDeMo T+A -> V	Recall@182.1	20
Average Retrieval	DiDeMo (test)	R@119.2	19
Audio-to-Text Retrieval	DiDeMo (test)	R@15.3	19
Text-to-Audio Retrieval	DiDeMo (test)	R@15.6	19
Audio-to-Video Retrieval	DiDeMo (test)	R@119.5	19
Video-to-Audio Retrieval	DiDeMo (test)	R@120.7	19
Video Retrieval	DiDeMo	R@146.1	18
Video-Text Retrieval	DIDEMO	GFLOPS44.5	18
Text-to-video retrieval	DiDeMo (UTD-split)	Recall@135.6	17
Video-to-text retrieval	DiDeMo	R@1 (Gaussian)20.32	14
Moment Retrieval	DiDeMo (test)	R@1 (IoU=0.3)46.3	14
Zero-shot Retrieval (T+V → A)	DiDeMo	Recall@10.695	14
Zero-shot Retrieval (T → A+V)	DiDeMo	Recall@153.7	14
Video Temporal Grounding	DiDeMo (test)	Recall@1 (IoU=0.3)69.2	11
Text-to-video retrieval	DiDeMo 28s (test)	R@138.1	11
Video Corpus Moment Retrieval (VCMR)	DiDeMo 14 (test)	Recall@1 (IoU=0.5)2.26	11
Text-to-Video Retrieval	DiDeMo 12 (full-corpus)	R@126	8
Text-to-Video Retrieval	DiDeMo 12 (test)	R@145.3	8

Showing 25 of 42 rows