MAIRA-1: A specialised large multimodal model for radiology report generation

About

We present a radiology-specific multimodal model for the task for generating radiological reports from chest X-rays (CXRs). Our work builds on the idea that large language model(s) can be equipped with multimodal capabilities through alignment with pre-trained vision encoders. On natural images, this has been shown to allow multimodal models to gain image understanding and description capabilities. Our proposed model (MAIRA-1) leverages a CXR-specific image encoder in conjunction with a fine-tuned large language model based on Vicuna-7B, and text-based data augmentation, to produce reports with state-of-the-art quality. In particular, MAIRA-1 significantly improves on the radiologist-aligned RadCliQ metric and across all lexical metrics considered. Manual review of model outputs demonstrates promising fluency and accuracy of generated reports while uncovering failure modes not captured by existing evaluation practices. More information and resources can be found on the project website: https://aka.ms/maira.

Stephanie L. Hyland, Shruthi Bannur, Kenza Bouzid, Daniel C. Castro, Mercy Ranjit, Anton Schwaighofer, Fernando P\'erez-Garc\'ia, Valentina Salvatelli, Shaury Srivastav, Anja Thieme, Noel Codella, Matthew P. Lungren, Maria Teodora Wetscherek, Ozan Oktay, Javier Alvarez-Valle• 2023

Related benchmarks

Task	Dataset	Result
Radiology Report Generation	MIMIC-CXR (test)	BLEU-40.142	235
Chest X-ray Report Generation	MIMIC-CXR (test)	F1 Macro (14)38.6	21
Medical Image Report Labeling	MIMIC-CXR (test)	Macro F1 (14 Labels)38.6	21
Radiology Report Generation	MIMIC-CXR FINDINGS section (test)	ROUGE-L28.9	11
Medical Report Generation	MIMIC-CXR Frontal images (test)	CheXbert mF1-1455.7	4

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord