UNIV: Unified Foundation Model for Infrared and Visible Modalities

About

Joint RGB-infrared perception is essential for achieving robustness under diverse weather and illumination conditions. Although foundation models excel within single modalities, they suffer from substantial cross-modal degradation, an issue we attribute to a pattern shortcut, i.e., a modal bias that prioritizes superficial sensor patterns over underlying semantics. To address this problem, we introduce UNIV, a Unified foundation model for Infrared and Visible modalities. At the core of UNIV lies Patch Cross-modal Contrastive Learning (PCCL), a self-supervised contrastive learning strategy that constructs a unified cross-modal feature space. PCCL employs a frozen pre-trained model to sample pseudo patch pairs based on semantic similarity, and aligns infrared-visible representations by attracting semantically related pairs while repelling unrelated ones. This process simultaneously enhances cross-modal alignment and inter-class semantic separability, guiding the model to focus on semantic structure rather than falling into pattern shortcuts. To further enable cross-modal learning, we introduce MVIP, the most comprehensive visible-infrared benchmark to date, containing 98,992 precisely aligned image pairs across diverse scenes. Extensive experiments demonstrate UNIV's superior performance on infrared tasks (+1.7 mIoU for semantic segmentation and +0.7 mAP for detection), while maintaining competitive accuracy on RGB tasks.

Fangyuan Mao, Shuo Wang, Jilin Mei, Shun Lu, Chen Min, Fuyang Liu, Xiaokun Feng, Meiqi Wu, Yu Hu• 2025

Related benchmarks

Task	Dataset	Result
Semantic segmentation	MSRS	mIoU79	120
Object Detection	M3FD-IR (test)	mAP56.9	11
Semantic segmentation	MSRS Infrared (test)	mIoU76.6	11
Semantic segmentation	SODA-IR (test)	mIoU69.6	8
Semantic segmentation	MFNet-IR (val)	mIoU50.78	8
Semantic segmentation	MFNet-IR (test)	mIoU51.06	8
Semantic segmentation	MSRS IR	mIoU0.76	4
Semantic segmentation	ADE20K RGB	mIoU51.2	3

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord