Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum

About

The prevailing video retrieval paradigm is structurally misaligned, as narrow benchmarks incentivize correspondingly limited data and single-task training. Therefore, universal capability is suppressed due to the absence of a diagnostic evaluation that defines and demands multi-dimensional generalization. To break this cycle, we introduce a framework built on the co-design of evaluation, data, and modeling. First, we establish the Universal Video Retrieval Benchmark (UVRB), a suite of 16 datasets designed not only to measure performance but also to diagnose critical capability gaps across tasks and domains. Second, guided by UVRB's diagnostics, we introduce a scalable synthesis workflow that generates 1.55 million high-quality pairs to populate the semantic space required for universality. Finally, we devise the Modality Pyramid, a curriculum that trains our General Video Embedder (GVE) by explicitly leveraging the latent interconnections within our diverse data. Extensive experiments show GVE achieves state-of-the-art zero-shot generalization on UVRB. In particular, our analysis reveals that popular benchmarks are poor predictors of general ability and that partially relevant retrieval is a dominant but overlooked scenario. Overall, our co-designed framework provides a practical path to escape the limited scope and advance toward truly universal video retrieval.

Zhuoning Guo, Mingxin Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Xiaowen Chu• 2025

Related benchmarks

TaskDatasetResultRank
Text-to-Video RetrievalMSRVTT
Recall@146.4
48
Video RetrievalCRB-T
R@153.9
18
Video RetrievalCMRB
R@1039.8
18
Image-to-Video RetrievalMSRVTT I2V
Recall@189.9
18
Video RetrievalDiDeMo
R@143.3
18
Video RetrievalPEV-K
R@141.3
18
Video RetrievalVDC-D
R@194.8
18
Video RetrievalUVRB Average of 16 datasets
Average Score57.3
18
Video RetrievalCRB-G
R@186.5
18
Video RetrievalCRB-S
R@184.7
18
Showing 10 of 21 rows

Other info

Follow for update