Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Knowledge-Refined Dual Context-Aware Network for Partially Relevant Video Retrieval

About

Retrieving partially relevant segments from untrimmed videos remains difficult due to two persistent challenges: the mismatch in information density between text and video segments, and limited attention mechanisms that overlook semantic focus and event correlations. We present KDC-Net, a Knowledge-Refined Dual Context-Aware Network that tackles these issues from both textual and visual perspectives. On the text side, a Hierarchical Semantic Aggregation module captures and adaptively fuses multi-scale phrase cues to enrich query semantics. On the video side, a Dynamic Temporal Attention mechanism employs relative positional encoding and adaptive temporal windows to highlight key events with local temporal coherence. Additionally, a dynamic CLIP-based distillation strategy, enhanced with temporal-continuity-aware refinement, ensures segment-aware and objective-aligned knowledge transfer. Experiments on PRVR benchmarks show that KDC-Net consistently outperforms state-of-the-art methods, especially under low moment-to-video ratios.

Junkai Yang, Qirui Wang, Yaoqing Jin, Shuai Ma, Minghan Xu, Shanmin Pang• 2026

Related benchmarks

TaskDatasetResultRank
Partially Relevant Video RetrievalTVR
R@115.4
16
Partially Relevant Video RetrievalActivityNet Captions
R@18.1
16
Partially Relevant Video RetrievalTVR M/V Interval (0, 0.2]
SumR184.4
12
Partially Relevant Video RetrievalTVR M/V Interval (0.2, 0.4]
SumR178.5
12
Partially Relevant Video RetrievalTVR M/V Interval (0.4, 1]
SumR183.9
12
Showing 5 of 5 rows

Other info

Follow for update