SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding

About

Despite the progress made by multimodal large language models (MLLMs) in computational pathology, they remain limited by a predominant focus on patch-level analysis, missing essential contextual information at the whole-slide level. The lack of large-scale instruction datasets and the gigapixel scale of whole slide images (WSIs) pose significant developmental challenges. In this paper, we present SlideChat, the first vision-language assistant capable of understanding gigapixel whole-slide images, exhibiting excellent multimodal conversational capability and response complex instruction across diverse pathology scenarios. To support its development, we created SlideInstruction, the largest instruction-following dataset for WSIs consisting of 4.2K WSI captions and 176K VQA pairs with multiple categories. Furthermore, we propose SlideBench, a multimodal benchmark that incorporates captioning and VQA tasks to assess SlideChat's capabilities in varied clinical settings such as microscopy, diagnosis. Compared to both general and specialized MLLMs, SlideChat exhibits exceptional capabilities achieving state-of-the-art performance on 18 of 22 tasks. For example, it achieved an overall accuracy of 81.17% on SlideBench-VQA (TCGA), and 54.15% on SlideBench-VQA (BCNB). Our code, data, and model is publicly accessible at https://uni-medical.github.io/SlideChat.github.io.

Ying Chen, Guoan Wang, Yuanfeng Ji, Yanjun Li, Jin Ye, Tianbin Li, Ming Hu, Rongshan Yu, Yu Qiao, Junjun He• 2024

Related benchmarks

Task	Dataset	Result
Visual Question Answering	SlideBench-VQA TCGA	Microscopy Score87.64	44
Visual Question Answering	WSI-VQA	Overall Accuracy60.18	43
Visual Question Answering	PathMMU Tiny 1.0 (test)	Overall Accuracy46.01	42
Visual Question Answering	SlideBench-VQA BCNB	Overall54.14	37
Visual Question Answering	PathMMU 1.0 (ALL test)	Overall Score45.65	22
Visual Question Answering	SB VQA	Balanced Accuracy70.5	20
Visual Question Answering	Panda	Balanced Accuracy17	20
Visual Question Answering	Expert VQA	Balanced Accuracy37.5	20
Visual Question Answering	TCGA	Balanced Accuracy3.3	20
Visual Question Answering	GTEx	Balanced Accuracy4.7	20

Showing 10 of 50 rows

Other info

Code

Follow for update

@wizwand_team Discord