Masks and Manuscripts: Advancing Medical Pre-training with End-to-End Masking and Narrative Structuring

About

Contemporary medical contrastive learning faces challenges from inconsistent semantics and sample pair morphology, leading to dispersed and converging semantic shifts. The variability in text reports, due to multiple authors, complicates semantic consistency. To tackle these issues, we propose a two-step approach. Initially, text reports are converted into a standardized triplet format, laying the groundwork for our novel concept of ``observations'' and ``verdicts''. This approach refines the {Entity, Position, Exist} triplet into binary questions, guiding towards a clear ``verdict''. We also innovate in visual pre-training with a Meijering-based masking, focusing on features representative of medical images' local context. By integrating this with our text conversion method, our model advances cross-modal representation in a multimodal contrastive learning framework, setting new benchmarks in medical image analysis.

Shreyank N Gowda, David A. Clifton• 2024

Related benchmarks

Task	Dataset	Result
Classification	CheXpert (test)	AUC ROC90.88	66
Medical Image Segmentation	RSNA Pneumonia	Dice Score76.68	49
Image Classification	SIIM-ACR (test)	AUROC93.88	45
Classification	RSNA Pneumonia	Accuracy83.14	32
Classification	RSNA Pneumonia (test)	AUC-ROC0.9191	27
Segmentation	SIIM-ACR	Dice Score80.28	27
Image Classification	SIIM-ACR	Accuracy86.15	25
Image Classification	NIH ChestX-ray	Accuracy88.52	21
Medical Image Segmentation	COVID-19	Dice Score45.04	21
Classification	Covid-19 CXR	AUC75.15	5

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord