Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis

About

Spatial transcriptomics enables interrogating the molecular composition of tissue with ever-increasing resolution and sensitivity. However, costs, rapidly evolving technology, and lack of standards have constrained computational methods in ST to narrow tasks and small cohorts. In addition, the underlying tissue morphology, as reflected by H&E-stained whole slide images (WSIs), encodes rich information often overlooked in ST studies. Here, we introduce HEST-1k, a collection of 1,229 spatial transcriptomic profiles, each linked to a WSI and extensive metadata. HEST-1k was assembled from 153 public and internal cohorts encompassing 26 organs, two species (Homo Sapiens and Mus Musculus), and 367 cancer samples from 25 cancer types. HEST-1k processing enabled the identification of 2.1 million expression--morphology pairs and over 76 million nuclei. To support its development, we additionally introduce the HEST-Library, a Python package designed to perform a range of actions with HEST samples. We test HEST-1k and Library on three use cases: (1) benchmarking foundation models for pathology (HEST-Benchmark), (2) biomarker exploration, and (3) multimodal representation learning. HEST-1k, HEST-Library, and HEST-Benchmark can be freely accessed at https://github.com/mahmoodlab/hest.

Guillaume Jaume, Paul Doucet, Andrew H. Song, Ming Y. Lu, Cristina Almagro-P\'erez, Sophia J. Wagner, Anurag J. Vaidya, Richard J. Chen, Drew F.K. Williamson, Ahrong Kim, Faisal Mahmood• 2024

Related benchmarks

TaskDatasetResultRank
Survival PredictionTCGA-BRCA
C-index0.653
115
Cancer SubtypingTCGA-BRCA (test)--
17
Patch-Level ClassificationColorectal cancer (test)
Bal.ACC98.2
11
gene expression predictionBreast-ST2 HEST-1k (test)
MAE0.452
10
gene expression predictionLung-ST HEST-1k (test)
MAE0.661
10
Gene expression prediction (top 250 cancer-specific genes)Breast-ST1
MAE0.362
10
Gene expression prediction (top 250 cancer-specific genes)Breast-ST2
MAE0.417
10
gene expression predictionBreast-ST1 HEST-1k (test)
MAE0.482
10
Gene expression prediction (top 250 cancer-specific genes)Lung-ST
MAE0.301
10
gene expression predictionBreast-ST1
MAE0.349
8
Showing 10 of 19 rows

Other info

Code

Follow for update