Knowledge-Refined Dual Context-Aware Network for Partially Relevant Video Retrieval

About

Retrieving partially relevant segments from untrimmed videos remains difficult due to two persistent challenges: the mismatch in information density between text and video segments, and limited attention mechanisms that overlook semantic focus and event correlations. We present KDC-Net, a Knowledge-Refined Dual Context-Aware Network that tackles these issues from both textual and visual perspectives. On the text side, a Hierarchical Semantic Aggregation module captures and adaptively fuses multi-scale phrase cues to enrich query semantics. On the video side, a Dynamic Temporal Attention mechanism employs relative positional encoding and adaptive temporal windows to highlight key events with local temporal coherence. Additionally, a dynamic CLIP-based distillation strategy, enhanced with temporal-continuity-aware refinement, ensures segment-aware and objective-aligned knowledge transfer. Experiments on PRVR benchmarks show that KDC-Net consistently outperforms state-of-the-art methods, especially under low moment-to-video ratios.

Junkai Yang, Qirui Wang, Yaoqing Jin, Shuai Ma, Minghan Xu, Shanmin Pang• 2026

Related benchmarks

Task	Dataset	Result
Partially Relevant Video Retrieval	ActivityNet Captions	R@18.1	38
Partially Relevant Video Retrieval	TVR	R@115.4	37
Partially Relevant Video Retrieval	TVR M/V Interval (0, 0.2]	SumR184.4	12
Partially Relevant Video Retrieval	TVR M/V Interval (0.2, 0.4]	SumR178.5	12
Partially Relevant Video Retrieval	TVR M/V Interval (0.4, 1]	SumR183.9	12

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord