Beyond Simple Edits: Composed Video Retrieval with Dense Modifications

About

Composed video retrieval is a challenging task that strives to retrieve a target video based on a query video and a textual description detailing specific modifications. Standard retrieval frameworks typically struggle to handle the complexity of fine-grained compositional queries and variations in temporal understanding limiting their retrieval ability in the fine-grained setting. To address this issue, we introduce a novel dataset that captures both fine-grained and composed actions across diverse video segments, enabling more detailed compositional changes in retrieved video content. The proposed dataset, named Dense-WebVid-CoVR, consists of 1.6 million samples with dense modification text that is around seven times more than its existing counterpart. We further develop a new model that integrates visual and textual information through Cross-Attention (CA) fusion using grounded text encoder, enabling precise alignment between dense query modifications and target videos. The proposed model achieves state-of-the-art results surpassing existing methods on all metrics. Notably, it achieves 71.3\% Recall@1 in visual+text setting and outperforms the state-of-the-art by 3.4\%, highlighting its efficacy in terms of leveraging detailed video descriptions and dense modification texts. Our proposed dataset, code, and model are available at :https://github.com/OmkarThawakar/BSE-CoVR

Omkar Thawakar, Dmitry Demidov, Ritesh Thawkar, Rao Muhammad Anwer, Mubarak Shah, Fahad Shahbaz Khan, Salman Khan• 2025

Related benchmarks

Task	Dataset	Result
Composed Video Retrieval	WebVid-CoVR (test)	R@163.8	86
Composed Image Retrieval	FashionIQ Zero-Shot	Average R@103.23e+3	13
Composed Video Retrieval	CoVR-R (test)	Recall@137.9	11
Composed Video Retrieval	EgoCVR Global 1.0	Recall@114.6	8
Composed Video Retrieval	EgoCVR 1.0 (Local)	Recall@144.8	8
Composed Video Retrieval	Dense WebVid-CoVR (test)	R@171.26	3

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord