Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

UNIV: Unified Foundation Model for Infrared and Visible Modalities

About

Joint RGB-infrared perception is essential for achieving robustness under diverse weather and illumination conditions. Although foundation models excel within single modalities, they suffer from substantial cross-modal degradation, an issue we attribute to a pattern shortcut, i.e., a modal bias that prioritizes superficial sensor patterns over underlying semantics. To address this problem, we introduce UNIV, a Unified foundation model for Infrared and Visible modalities. At the core of UNIV lies Patch Cross-modal Contrastive Learning (PCCL), a self-supervised contrastive learning strategy that constructs a unified cross-modal feature space. PCCL employs a frozen pre-trained model to sample pseudo patch pairs based on semantic similarity, and aligns infrared-visible representations by attracting semantically related pairs while repelling unrelated ones. This process simultaneously enhances cross-modal alignment and inter-class semantic separability, guiding the model to focus on semantic structure rather than falling into pattern shortcuts. To further enable cross-modal learning, we introduce MVIP, the most comprehensive visible-infrared benchmark to date, containing 98,992 precisely aligned image pairs across diverse scenes. Extensive experiments demonstrate UNIV's superior performance on infrared tasks (+1.7 mIoU for semantic segmentation and +0.7 mAP for detection), while maintaining competitive accuracy on RGB tasks.

Fangyuan Mao, Shuo Wang, Jilin Mei, Shun Lu, Chen Min, Fuyang Liu, Xiaokun Feng, Meiqi Wu, Yu Hu• 2025

Related benchmarks

TaskDatasetResultRank
Semantic segmentationMSRS
mIoU79
93
Object DetectionM3FD-IR (test)
mAP56.9
11
Semantic segmentationMSRS Infrared (test)
mIoU76.6
11
Semantic segmentationSODA-IR (test)
mIoU69.6
8
Semantic segmentationMFNet-IR (val)
mIoU50.78
8
Semantic segmentationMFNet-IR (test)
mIoU51.06
8
Semantic segmentationMSRS IR
mIoU0.76
4
Semantic segmentationADE20K RGB
mIoU51.2
3
Showing 8 of 8 rows

Other info

Follow for update