Spatial Blindness in Whole-Slide Multiple Instance Learning

About

Whole-slide MIL models are often called context-aware once graphs, Transform ers, or state-space modules are placed above patch embeddings. We show that this label can be deceptive. On pathology tasks where tissue architecture is part of the diagnostic signal, several strong MIL baselines retain nearly unchanged slide level AUC after patch coordinates are permuted. Their predictions are accurate, but largely compositional. We refer to this failure mode as spatial blindness. Our explanation is optimization-based: dense appearance statistics are learned early under slide-level supervision, leaving weak gradients for sparse spatial relations. ResTopoMIL addresses the issue by first fitting a permutation-invariant prototype histogram and then freezing it while a lightweight graph branch learns the residual under a coordinate-shuffling constraint. The architecture is simple by design; the intervention is in how the spatial branch is trained. Across 9 public WSI bench marks, ResTopoMIL improves classification and survival prediction with 1.15M parameters, restores sensitivity to coordinate perturbation, and gives stronger lo calization evidence on CAMELYON-16.

Xiangyu Li, Ran Su• 2026

Related benchmarks

Task	Dataset	Result
Survival Prediction	TCGA-LUAD	C-index0.6457	213
Survival Prediction	TCGA-UCEC	C-index0.7058	184
Survival Prediction	TCGA-STAD	C-index0.6807	125
Survival Prediction	KIRC TCGA	C-Index0.7313	102
Cancer Classification	TCGA-BRCA	Accuracy95.68	94
Survival Prediction	TCGA-KIRP	C-index0.8182	63
WSI Classification	Panda	Accuracy75.46	32
WSI Classification	TCGA-NSCLC	Accuracy91.57	28
Classification	PANDA (test)	Accuracy73.5	19
Tumor localization	CAMELYON-16	Dice0.624	14

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord