Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ReinPath: A Multimodal Reinforcement Learning Approach for Pathology

About

Interpretability is significant in computational pathology, leading to the development of multimodal information integration from histopathological image and corresponding text data.However, existing multimodal methods have limited interpretability due to the lack of high-quality dataset that support explicit reasoning and inference and simple reasoning process.To address the above problems, we introduce a novel multimodal pathology large language model with strong reasoning capabilities.To improve the generation of accurate and contextually relevant textual descriptions, we design a semantic reward strategy integrated with group relative policy optimization.We construct a high-quality pathology visual question answering (VQA) dataset, specifically designed to support complex reasoning tasks.Comprehensive experiments conducted on this dataset demonstrate that our method outperforms state-of-the-art methods, even when trained with only 20% of the data.Our method also achieves comparable performance on downstream zero-shot image classification task compared with CLIP.

Kangcheng Zhou, Jun Jiang, Qing Zhang, Shuang Zheng, Qingli Li, Shugong Xu• 2026

Related benchmarks

TaskDatasetResultRank
Visual Question AnsweringPathVQA (test)
Overall Accuracy54.4
19
ClassificationLC-Colon
Accuracy95.2
13
ClassificationLC-Lung
Accuracy62.1
13
Visual Question AnsweringPathMMU
PubMed Score49.9
8
Image ClassificationCRC
Accuracy38.4
7
Image ClassificationWSSS4LUAD
Accuracy83.2
6
Visual Question AnsweringPMC-VQA (test)
Accuracy38.6
5
Visual Question AnsweringQuilt-VQA (test)
Recall60.1
5
Showing 8 of 8 rows

Other info

Follow for update