Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding

About

Despite the progress made by multimodal large language models (MLLMs) in computational pathology, they remain limited by a predominant focus on patch-level analysis, missing essential contextual information at the whole-slide level. The lack of large-scale instruction datasets and the gigapixel scale of whole slide images (WSIs) pose significant developmental challenges. In this paper, we present SlideChat, the first vision-language assistant capable of understanding gigapixel whole-slide images, exhibiting excellent multimodal conversational capability and response complex instruction across diverse pathology scenarios. To support its development, we created SlideInstruction, the largest instruction-following dataset for WSIs consisting of 4.2K WSI captions and 176K VQA pairs with multiple categories. Furthermore, we propose SlideBench, a multimodal benchmark that incorporates captioning and VQA tasks to assess SlideChat's capabilities in varied clinical settings such as microscopy, diagnosis. Compared to both general and specialized MLLMs, SlideChat exhibits exceptional capabilities achieving state-of-the-art performance on 18 of 22 tasks. For example, it achieved an overall accuracy of 81.17% on SlideBench-VQA (TCGA), and 54.15% on SlideBench-VQA (BCNB). Our code, data, and model is publicly accessible at https://uni-medical.github.io/SlideChat.github.io.

Ying Chen, Guoan Wang, Yuanfeng Ji, Yanjun Li, Jin Ye, Tianbin Li, Ming Hu, Rongshan Yu, Yu Qiao, Junjun He• 2024

Related benchmarks

TaskDatasetResultRank
Visual Question AnsweringSlideBench-VQA TCGA
Microscopy Score87.64
32
Visual Question AnsweringWSI-VQA
Overall Accuracy60.18
25
Visual Question AnsweringSlideBench-VQA BCNB
Overall54.14
25
Visual Question AnsweringPathMMU Tiny 1.0 (test)
Overall Accuracy46.01
24
Visual Question AnsweringPathMMU 1.0 (ALL test)
Overall Score45.65
22
Whole-slide image visual-question answeringSlideBench TCGA
Accuracy75.36
14
Whole-slide image visual-question answeringCPTAC
Accuracy48.75
14
Open-ended Pathology AnalysisPathReasoner (test)
BLEU0.08
14
WSI CaptioningSlideBench
BLEU-10.37
11
Whole Slide Image AnalysisWSI-Bench (test)
Morphological Analysis Open WSI P26.9
10
Showing 10 of 15 rows

Other info

Code

Follow for update