Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Beyond Pixel Simulation: Pathology Image Generation via Diagnostic Semantic Tokens and Prototype Control

About

In computational pathology, understanding and generation have evolved along disparate paths: advanced understanding models already exhibit diagnostic-level competence, whereas generative models largely simulate pixels. Progress remains hindered by three coupled factors: the scarcity of large, high-quality image-text corpora; the lack of precise, fine-grained semantic control, which forces reliance on non-semantic cues; and terminological heterogeneity, where diverse phrasings for the same diagnostic concept impede reliable text conditioning. We introduce UniPath, a semantics-driven pathology image generation framework that leverages mature diagnostic understanding to enable controllable generation. UniPath implements Multi-Stream Control: a Raw-Text stream; a High-Level Semantics stream that uses learnable queries to a frozen pathology MLLM to distill paraphrase-robust Diagnostic Semantic Tokens and to expand prompts into diagnosis-aware attribute bundles; and a Prototype stream that affords component-level morphological control via a prototype bank. On the data front, we curate a 2.65M image-text corpus and a finely annotated, high-quality 68K subset to alleviate data scarcity. For a comprehensive assessment, we establish a four-tier evaluation hierarchy tailored to pathology. Extensive experiments demonstrate UniPath's SOTA performance, including a Patho-FID of 80.9 (51% better than the second-best) and fine-grained semantic control achieving 98.7% of the real-image. The dataset and code can be obtained from https://github.com/Hanminghao/UniPath.

Minghao Han, Yichen Liu, Yizhou Liu, Zizhi Chen, Jingqun Tang, Xuecheng Wu, Dingkang Yang, Lihua Zhang• 2025

Related benchmarks

TaskDatasetResultRank
ClassificationKather-CRC 2016
Weighted F187.15
35
Pathological Multimodal UnderstandingPathMMU ALL (test)
PubMed Accuracy66.4
16
Pathological Multimodal UnderstandingPathMMU Tiny (test)
PubMed Score72.9
15
Fine-grained ControlCytology Type 4-classes
Weighted F181.49
12
Fine-grained ControlHemorrhage 2-classes
Weighted F177.02
12
Text-to-Image GenerationPathological T2I/I2I Merged (test)
FID484.4
9
Pathology Text-to-Image Generation10K High-Quality Pathology 1.0 (test)
CLIP-Score0.348
9
Image-to-Image GenerationPathological T2I/I2I Merged (test)
Recall@104.25
8
Showing 8 of 8 rows

Other info

Follow for update