ReinPath: A Multimodal Reinforcement Learning Approach for Pathology

About

Interpretability is significant in computational pathology, leading to the development of multimodal information integration from histopathological image and corresponding text data.However, existing multimodal methods have limited interpretability due to the lack of high-quality dataset that support explicit reasoning and inference and simple reasoning process.To address the above problems, we introduce a novel multimodal pathology large language model with strong reasoning capabilities.To improve the generation of accurate and contextually relevant textual descriptions, we design a semantic reward strategy integrated with group relative policy optimization.We construct a high-quality pathology visual question answering (VQA) dataset, specifically designed to support complex reasoning tasks.Comprehensive experiments conducted on this dataset demonstrate that our method outperforms state-of-the-art methods, even when trained with only 20% of the data.Our method also achieves comparable performance on downstream zero-shot image classification task compared with CLIP.

Kangcheng Zhou, Jun Jiang, Qing Zhang, Shuang Zheng, Qingli Li, Shugong Xu• 2026

Related benchmarks

Task	Dataset	Result
Visual Question Answering	PMC-VQA (test)	Accuracy38.6	27
Classification	LC-Colon	Accuracy95.2	25
Classification	LC-Lung	Accuracy62.1	25
Visual Question Answering	PathVQA (test)	Overall Accuracy54.4	19
Image Classification	CRC	Accuracy38.4	19
Image Classification	WSSS4LUAD	--	9
Visual Question Answering	PathMMU	PubMed Score49.9	8
Visual Question Answering	Quilt-VQA (test)	Recall60.1	5

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord