Layout-Guided Controllable Pathology Image Generation with In-Context Diffusion Transformers
About
Controllable pathology image synthesis requires reliable regulation of spatial layout, tissue morphology, and semantic detail. However, existing text-guided diffusion models offer only coarse global control and lack the ability to enforce fine-grained structural constraints. Progress is further limited by the absence of large datasets that pair patch-level spatial layouts with detailed diagnostic descriptions, since generating such annotations for gigapixel whole-slide images is prohibitively time-consuming for human experts. To overcome these challenges, we first develop a scalable multi-agent LVLM annotation framework that integrates image description, diagnostic step extraction, and automatic quality judgment into a coordinated pipeline, and we evaluate the reliability of the system through a human verification process. This framework enables efficient construction of fine-grained and clinically aligned supervision at scale. Building on the curated data, we propose In-Context Diffusion Transformer (IC-DiT), a layout-aware generative model that incorporates spatial layouts, textual descriptions, and visual embeddings into a unified diffusion transformer. Through hierarchical multimodal attention, IC-DiT maintains global semantic coherence while accurately preserving structural and morphological details. Extensive experiments on five histopathology datasets show that IC-DiT achieves higher fidelity, stronger spatial controllability, and better diagnostic consistency than existing methods. In addition, the generated images serve as effective data augmentation resources for downstream tasks such as cancer classification and survival analysis.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Survival Prediction | TCGA-UCEC | C-index0.7415 | 142 | |
| Survival Prediction | BLCA | C-Index0.7255 | 66 | |
| Survival Prediction | BRCA | C-Index0.7154 | 66 | |
| Survival Prediction | LUAD | C-index0.7251 | 50 | |
| Survival Prediction | GBMLGG | C-index0.8811 | 20 | |
| Cancer Classification | TCGA cohorts (BLCA, BRCA, GBMLGG, LUAD, UCEC) Downstream tasks | Accuracy (BLCA Cohort)89.86 | 10 | |
| Mask-to-Image Faithfulness | BLCA TCGA (test) | Faithfulness Score83.19 | 10 | |
| Mask-to-Image Faithfulness | BRCA TCGA (test) | Faithfulness Score84.12 | 10 | |
| Mask-to-Image Faithfulness | GBMLGG TCGA (test) | Faithfulness Score83.52 | 10 | |
| Mask-to-Image Faithfulness | LUAD TCGA (test) | Faithfulness Score83.63 | 10 |