Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TemCoCo: Temporally Consistent Multi-modal Video Fusion with Visual-Semantic Collaboration

About

Existing multi-modal fusion methods typically apply static frame-based image fusion techniques directly to video fusion tasks, neglecting inherent temporal dependencies and leading to inconsistent results across frames. To address this limitation, we propose the first video fusion framework that explicitly incorporates temporal modeling with visual-semantic collaboration to simultaneously ensure visual fidelity, semantic accuracy, and temporal consistency. First, we introduce a visual-semantic interaction module consisting of a semantic branch and a visual branch, with Dinov2 and VGG19 employed for targeted distillation, allowing simultaneous enhancement of both the visual and semantic representations. Second, we pioneer integrate the video degradation enhancement task into the video fusion pipeline by constructing a temporal cooperative module, which leverages temporal dependencies to facilitate weak information recovery. Third, to ensure temporal consistency, we embed a temporal-enhanced mechanism into the network and devise a temporal loss to guide the optimization process. Finally, we introduce two innovative evaluation metrics tailored for video fusion, aimed at assessing the temporal consistency of the generated fused videos. Extensive experimental results on public video datasets demonstrate the superiority of our method. Our code is released at https://github.com/Meiqi-Gong/TemCoCo.

Meiqi Gong, Hao Zhang, Xunpeng Yi, Linfeng Tang, Jiayi Ma• 2025

Related benchmarks

TaskDatasetResultRank
Infrared-Visible Video FusionHDO 2024 (test)
BiSWE6.414
13
Infrared-Visible Video FusionM3SVD
CC53.5
13
Infrared-Visible Video FusionVTMOT 2025 (test)
BiSWE8.122
13
Infrared-Visible Video FusionVTMOT
Contrast Contribution (CC)0.545
13
Infrared-Visible Video FusionHDO
CC0.585
13
Infrared-Visible Video FusionM3SVD 2025 (test)
BiSWE6.488
13
Video FusionVTMOT
QG28.85
13
Infrared-Visible Video FusionNOT-156
CC0.236
13
Infrared-Visible Video FusionNOT-156 2025 (test)
BiSWE4.593
13
Object TrackingNOT-156
AUC18.5
13
Showing 10 of 17 rows

Other info

Follow for update