CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis

About

Spectral imaging offers promising applications across diverse domains, including medicine and urban scene understanding, and is already established as a critical modality in remote sensing. However, variability in channel dimensionality and captured wavelengths among spectral cameras impede the development of AI-driven methodologies, leading to camera-specific models with limited generalizability and inadequate cross-camera applicability. To address this bottleneck, we introduce CARL, a model for Camera-Agnostic Representation Learning across RGB, multispectral, and hyperspectral imaging modalities. To enable the conversion of a spectral image with any channel dimensionality to a camera-agnostic representation, we introduce a novel spectral encoder, featuring a self-attention-cross-attention mechanism, to distill salient spectral information into learned spectral representations. Spatio-spectral pre-training is achieved with a novel feature-based self-supervision strategy tailored to CARL. Large-scale experiments across the domains of medical imaging, autonomous driving, and satellite imaging demonstrate our model's unique robustness to spectral heterogeneity, outperforming on datasets with simulated and real-world cross-camera spectral variations. The scalability and versatility of the proposed approach position our model as a backbone for future spectral foundation models. Code and model weights are publicly available at https://github.com/IMSY-DKFZ/CARL.

Alexander Baumann, Leonardo Ayala, Silvia Seidlitz, Jan Sellner, Alexander Studier-Fischer, Berkin \"Ozdemir, Lena Maier-Hein, Slobodan Ilic• 2025

Related benchmarks

Task	Dataset	Result
Image Classification	m-forestnet (test)	--	13
Semantic segmentation	DESIS-CDL	mIoU58.5	11
Semantic segmentation	EnMAP BNTD	mIoU36.3	11
Semantic segmentation	EnMAP TreeMap	mIoU35.6	11
Semantic segmentation	EnMAP BD-Foret	mIoU52.3	11
Semantic segmentation	EnMAP EuCrops	mIoU47	11
Semantic segmentation	EnMAP NLCD	mIoU34.3	11
Semantic segmentation	SpectralEarth Benchmarks Aggregate	Rank7.3	11
Semantic segmentation	EnMAP CDL	mIoU57.4	11
Semantic segmentation	GF-5 Wuhan	mIoU40.4	11

Showing 10 of 26 rows

Other info

Follow for update

@wizwand_team Discord