Tevatron 2.0: Unified Document Retrieval Toolkit across Scale, Language, and Modality

About

Recent advancements in large language models (LLMs) have driven interest in billion-scale retrieval models with strong generalization across retrieval tasks and languages. Additionally, progress in large vision-language models has created new opportunities for multimodal retrieval. In response, we have updated the Tevatron toolkit, introducing a unified pipeline that enables researchers to explore retriever models at different scales, across multiple languages, and with various modalities. This demo paper highlights the toolkit's key features, bridging academia and industry by supporting efficient training, inference, and evaluation of neural retrievers. We showcase a unified dense retriever achieving strong multilingual and multimodal effectiveness, and conduct a cross-modality zero-shot study to demonstrate its research potential. Alongside, we release OmniEmbed, to the best of our knowledge, the first embedding model that unifies text, image document, video, and audio retrieval, serving as a baseline for future research.

Xueguang Ma, Luyu Gao, Shengyao Zhuang, Jiaqi Samantha Zhan, Jamie Callan, Jimmy Lin• 2025

Related benchmarks

Task	Dataset	Result
Text-to-Audio Retrieval	AudioCaps (test)	Recall@134	191
Video Retrieval	MSR-VTT	R@151.5	34
Retrieval	MMEB v2	Image Retrieval Score37.1	18
Video Retrieval	MULTIVENT 2.0	Recall@1052.3	11
Article Generation	WikiVideo (test)	InfoP Score88	10
Multimodal Retrieval	WikiVideo (test)	Alpha-nDCG53	10

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord