Cross-modal Prototype Driven Network for Radiology Report Generation

About

Radiology report generation (RRG) aims to describe automatically a radiology image with human-like language and could potentially support the work of radiologists, reducing the burden of manual reporting. Previous approaches often adopt an encoder-decoder architecture and focus on single-modal feature learning, while few studies explore cross-modal feature interaction. Here we propose a Cross-modal PROtotype driven NETwork (XPRONET) to promote cross-modal pattern learning and exploit it to improve the task of radiology report generation. This is achieved by three well-designed, fully differentiable and complementary modules: a shared cross-modal prototype matrix to record the cross-modal prototypes; a cross-modal prototype network to learn the cross-modal prototypes and embed the cross-modal information into the visual and textual features; and an improved multi-label contrastive loss to enable and enhance multi-label prototype learning. XPRONET obtains substantial improvements on the IU-Xray and MIMIC-CXR benchmarks, where its performance exceeds recent state-of-the-art approaches by a large margin on IU-Xray and comparable performance on MIMIC-CXR.

Jun Wang, Abhir Bhalerao, Yulan He• 2022

Related benchmarks

Task	Dataset	Result
Radiology Report Generation	MIMIC-CXR (test)	BLEU-40.105	235
Radiology Report Generation	IU-Xray (test)	ROUGE-L0.411	116
Medical Report Generation	MIMIC-CXR (test)	ROUGE-L0.279	100
Radiology Report Generation	CheXpert Plus (test)	Precision0.314	88
Medical Report Generation	IU-Xray (test)	ROUGE-L0.387	56
Radiology Report Generation	IU-Xray	ROUGE-L Score36.4	38
Radiology Report Generation	CHEXPERT Plus	R-L0.265	37
Medical Report Generation	MIMIC-CXR	F1 Score35.3	34
Report Generation	MIMIC-CXR (test)	BLEU-40.1052	20
CXR-to-report generation	OPENI (test)	BLEU-10.4114	18

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord