ASMIL: Attention-Stabilized Multiple Instance Learning for Whole Slide Imaging
About
Attention-based multiple instance learning (MIL) has emerged as a powerful framework for whole slide image (WSI) diagnosis, leveraging attention to aggregate instance-level features into bag-level predictions. Despite this success, we find that such methods exhibit a new failure mode: unstable attention dynamics. Across four representative attention-based MIL methods and two public WSI datasets, we observe that attention distributions oscillate across epochs rather than converging to a consistent pattern, degrading performance. This instability adds to two previously reported challenges: overfitting and over-concentrated attention distribution. To simultaneously overcome these three limitations, we introduce attention-stabilized multiple instance learning (ASMIL), a novel unified framework. ASMIL uses an anchor model to stabilize attention, replaces softmax with a normalized sigmoid function in the anchor to prevent over-concentration, and applies token random dropping to mitigate overfitting. Extensive experiments demonstrate that ASMIL achieves up to a 6.49\% F1 score improvement over state-of-the-art methods. Moreover, integrating the anchor model and normalized sigmoid into existing attention-based MIL methods consistently boosts their performance, with F1 score gains up to 10.73\%. All code and data are publicly available at https://github.com/Linfeng-Ye/ASMIL.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Survival Prediction | TCGA-LUAD | C-index0.6001 | 154 | |
| Survival Prediction | TCGA-UCEC | C-index0.7243 | 142 | |
| Survival Prediction | TCGA-BRCA | C-index0.6396 | 101 | |
| Survival Prediction | TCGA-BLCA | C-index0.6133 | 94 | |
| Survival Analysis | TCGA-GBMLGG | C-index0.8036 | 44 | |
| Multiple Instance Learning Classification | MUSK1 | Accuracy97.1 | 26 | |
| Multiple Instance Learning Classification | MUSK2 | Accuracy96.8 | 26 | |
| Multiple Instance Learning Classification | Elephant | Accuracy98.5 | 26 | |
| WSI subtyping | CAMELYON-16 | F1 Score96.5 | 24 | |
| WSI subtyping | CAMELYON 17 | F1 Score68.9 | 24 |