Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis

About

Spectral imaging offers promising applications across diverse domains, including medicine and urban scene understanding, and is already established as a critical modality in remote sensing. However, variability in channel dimensionality and captured wavelengths among spectral cameras impede the development of AI-driven methodologies, leading to camera-specific models with limited generalizability and inadequate cross-camera applicability. To address this bottleneck, we introduce CARL, a model for Camera-Agnostic Representation Learning across RGB, multispectral, and hyperspectral imaging modalities. To enable the conversion of a spectral image with any channel dimensionality to a camera-agnostic representation, we introduce a novel spectral encoder, featuring a self-attention-cross-attention mechanism, to distill salient spectral information into learned spectral representations. Spatio-spectral pre-training is achieved with a novel feature-based self-supervision strategy tailored to CARL. Large-scale experiments across the domains of medical imaging, autonomous driving, and satellite imaging demonstrate our model's unique robustness to spectral heterogeneity, outperforming on datasets with simulated and real-world cross-camera spectral variations. The scalability and versatility of the proposed approach position our model as a backbone for future spectral foundation models. Code and model weights are publicly available at https://github.com/IMSY-DKFZ/CARL.

Alexander Baumann, Leonardo Ayala, Silvia Seidlitz, Jan Sellner, Alexander Studier-Fischer, Berkin \"Ozdemir, Lena Maier-Hein, Slobodan Ilic• 2025

Related benchmarks

TaskDatasetResultRank
Medical Organ SegmentationTivita Tissue HSI (test)
mIoU64.6
9
Remote Sensing Image Classificationm-bigearthnet
Accuracy69
7
Remote Sensing Image Classificationm-cashew
Accuracy18.9
7
Remote Sensing Image ClassificationSentinel-2 benchmark suite
Rank1.6
7
Semantic segmentation11 Remote Sensing Benchmark Datasets 1.0 (aggregated)
Average Rank1.6
7
Urban scene semantic segmentationHSICity (test)
mIoU50.1
7
Remote Sensing Image Classificationm-SA crop-type
Accuracy26.5
7
Remote Sensing Image Classificationm-eurosat
Accuracy84.4
7
Semantic segmentationSegMunich in-distribution (test)
mIoU38.9
6
Image ClassificationLoveDA Urban (test)
Accuracy29
4
Showing 10 of 15 rows

Other info

Follow for update