Learning Spatial-Preserving Hierarchical Representations for Digital Pathology
About
Whole slide images (WSIs) pose fundamental computational challenges due to their gigapixel resolution and the sparse distribution of informative regions. Existing approaches often treat image patches independently or reshape them in ways that distort spatial context, thereby obscuring the hierarchical pyramid representations intrinsic to WSIs. We introduce Sparse Pyramid Attention Networks (SPAN), a hierarchical framework that preserves spatial relationships while allocating computation to informative regions. SPAN constructs multi-scale representations directly from single-scale inputs, enabling precise hierarchical modeling of WSI data. We demonstrate SPAN's versatility through two variants: SPAN-MIL for slide classification and SPAN-UNet for segmentation. Comprehensive evaluations across multiple public datasets show that SPAN effectively captures hierarchical structure and contextual relationships. Our results provide clear evidence that architectural inductive biases and hierarchical representations enhance both slide-level and patch-level performance. By addressing key computational challenges in WSI analysis, SPAN provides an effective framework for computational pathology and demonstrates important design principles for large-scale medical image analysis.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Survival Prediction | LUAD | C-index0.57 | 50 | |
| Classification | BRACS | Accuracy77.8 | 44 | |
| Survival Prediction | LUSC | C-index0.584 | 24 | |
| Classification | Yale HER2 | Accuracy86 | 18 | |
| Survival Prediction | TCGA LGG | C-index0.647 | 15 | |
| Segmentation | CAMELYON-16 | Dice Score90.8 | 12 | |
| Segmentation | SegCAMELYON | Dice Coefficient88.7 | 12 | |
| Segmentation | Yale HER2 | Dice Coefficient63 | 12 | |
| Segmentation | BACH | Dice Score83 | 12 |