Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration

About

Vision Language Models (VLMs) like CLIP have attracted substantial attention in pathology, serving as backbones for applications such as zero-shot image classification and Whole Slide Image (WSI) analysis. Additionally, they can function as vision encoders when combined with large language models (LLMs) to support broader capabilities. Current efforts to train pathology VLMs rely on pathology image-text pairs from platforms like PubMed, YouTube, and Twitter, which provide limited, unscalable data with generally suboptimal image quality. In this work, we leverage large-scale WSI datasets like TCGA to extract numerous high-quality image patches. We then train a large multimodal model to generate captions for these images, creating PathGen-1.6M, a dataset containing 1.6 million high-quality image-caption pairs. Our approach involves multiple agent models collaborating to extract representative WSI patches, generating and refining captions to obtain high-quality image-text pairs. Extensive experiments show that integrating these generated pairs with existing datasets to train a pathology-specific CLIP model, PathGen-CLIP, significantly enhances its ability to analyze pathological images, with substantial improvements across nine pathology-related zero-shot image classification tasks and three whole-slide image tasks. Furthermore, we construct 200K instruction-tuning data based on PathGen-1.6M and integrate PathGen-CLIP with the Vicuna LLM to create more powerful multimodal models through instruction tuning. Overall, we provide a scalable pathway for high-quality data generation in pathology, paving the way for next-generation general pathology models.

Yuxuan Sun, Yunlong Zhang, Yixuan Si, Chenglu Zhu, Zhongyi Shui, Kai Zhang, Jingxiong Li, Xingheng Lyu, Tao Lin, Lin Yang• 2024

Related benchmarks

TaskDatasetResultRank
Image ClassificationPCAM
Top-1 Acc88.2
58
Visual Question AnsweringSlideBench-VQA TCGA
Microscopy Score68.67
32
Visual Question AnsweringPathMMU Tiny 1.0 (test)
Overall Accuracy60.1
24
Visual Question AnsweringPathMMU 1.0 (ALL test)
Overall Score58.4
22
ClassificationBACH
Accuracy71.5
19
WSI ClassificationTCGA-RCC--
18
Pathological Multimodal UnderstandingPathMMU ALL (test)
PubMed Accuracy59.9
16
Gene Mutation PredictionCPTAC
BRCA PIK3CA AUC0.586
15
Pathological Multimodal UnderstandingPathMMU Tiny (test)
PubMed Score59.3
15
ClassificationSkinCancer
Accuracy70.6
14
Showing 10 of 58 rows

Other info

Follow for update