Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Fusing Pixels and Genes: Spatially-Aware Learning in Computational Pathology

About

Recent years have witnessed remarkable progress in multimodal learning within computational pathology. Existing models primarily rely on vision and language modalities; however, language alone lacks molecular specificity and offers limited pathological supervision, leading to representational bottlenecks. In this paper, we propose STAMP, a Spatial Transcriptomics-Augmented Multimodal Pathology representation learning framework that integrates spatially-resolved gene expression profiles to enable molecule-guided joint embedding of pathology images and transcriptomic data. Our study shows that self-supervised, gene-guided training provides a robust and task-agnostic signal for learning pathology image representations. Incorporating spatial context and multi-scale information further enhances model performance and generalizability. To support this, we constructed SpaVis-6M, the largest Visium-based spatial transcriptomics dataset to date, and trained a spatially-aware gene encoder on this resource. Leveraging hierarchical multi-scale contrastive alignment and cross-scale patch localization mechanisms, STAMP effectively aligns spatial transcriptomics with pathology images, capturing spatial structure and molecular variation. We validate STAMP across six datasets and four downstream tasks, where it consistently achieves strong performance. These results highlight the value and necessity of integrating spatially resolved molecular supervision for advancing multimodal learning in computational pathology. The code is included in the supplementary materials. The pretrained weights and SpaVis-6M are available at: https://github.com/Hanminghao/STAMP.

Minghao Han, Dingkang Yang, Linhao Qu, Zizhi Chen, Gang Li, Han Wang, Jiacong Wang, Lihua Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Survival PredictionTCGA-LUAD
C-index0.6385
116
Survival AnalysisTCGA-LUSC
C-index0.6321
38
ClusteringDLPFC
ARI36.9
30
WSI ClassificationPanda
Accuracy70.87
23
Linear ProbingDLPFC
Balanced Accuracy72.1
22
Linear ProbingHBC
Balanced Accuracy89.9
22
Unsupervised ClusteringHBC
ARI59
22
WSI ClassificationTCGA-NSCLC
Accuracy90.87
19
gene expression predictionPSC
MSE0.301
16
gene expression predictionHHK
MSE1.233
16
Showing 10 of 12 rows

Other info

Follow for update