Diffusion Generative Modeling for Spatially Resolved Gene Expression Inference from Histology Images
About
Spatial Transcriptomics (ST) allows a high-resolution measurement of RNA sequence abundance by systematically connecting cell morphology depicted in Hematoxylin and Eosin (H&E) stained histology images to spatially resolved gene expressions. ST is a time-consuming, expensive yet powerful experimental technique that provides new opportunities to understand cancer mechanisms at a fine-grained molecular level, which is critical for uncovering new approaches for disease diagnosis and treatments. Here, we present $\textbf{Stem}$ ($\textbf{S}$pa$\textbf{T}$ially resolved gene $\textbf{E}$xpression inference with diffusion $\textbf{M}$odel), a novel computational tool that leverages a conditional diffusion generative model to enable in silico gene expression inference from H&E stained images. Through better capturing the inherent stochasticity and heterogeneity in ST data, $\textbf{Stem}$ achieves state-of-the-art performance on spatial gene expression prediction and generates biologically meaningful gene profiles for new H&E stained images at test time. We evaluate the proposed algorithm on datasets with various tissue sources and sequencing platforms, where it demonstrates clear improvement over existing approaches. $\textbf{Stem}$ generates high-fidelity gene expression predictions that share similar gene variation levels as ground truth data, suggesting that our method preserves the underlying biological heterogeneity. Our proposed pipeline opens up the possibility of analyzing existing, easily accessible H&E stained histology images from a genomics point of view without physically performing gene expression profiling and empowers potential biological discovery from H&E stained histology images.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| gene expression prediction | IDC Top 50 HVGs | Macro-Avg PCC0.178 | 8 | |
| Spatial Transcriptomics Prediction | HEST-1K Kidney 1.0 (test) | -- | 6 | |
| MSI Status Classification | TCGA-COADREAD (external independent dataset) | Score (MSI-H)0.584 | 5 | |
| Spatial Transcriptomics Prediction | HEST-1K Colorectum 1.0 (test) | PCC@100.67 | 5 | |
| Spatial Transcriptomics Prediction | HEST-1K Lung 1.0 (test) | PCC@100.546 | 5 | |
| Spatial Transcriptomics Prediction | HEST-1K Skin 1.0 (test) | PCC@100.782 | 5 | |
| gene expression prediction | HCC Top 50 HVGs | Macro-Average PCC0.098 | 4 | |
| gene expression prediction | CCRCC Top 50 HVGs | Macro-Avg PCC0.124 | 4 | |
| gene expression prediction | COAD Top 50 HVGs | Macro PCC0.236 | 4 | |
| gene expression prediction | LUNG Top 50 HVGs | Macro PCC0.22 | 4 |