RA-RRG: Multimodal Retrieval-Augmented Radiology Report Generation with Key Phrase Extraction

About

Automated radiology report generation (RRG) holds potential to reduce the workload of radiologists, and recent advances in multimodal large language models (MLLMs) have enabled multimodal chest X-ray (CXR) report generation. However, existing MLLMs are computationally expensive, require large-scale training data, and may produce hallucinated content, limiting their practical deployment. To address these limitations, we propose RA-RRG, a retrieval-augmented RRG framework that combines multimodal retrieval with large language models (LLMs) to generate radiology reports while reducing hallucinations and computational demands. RA-RRG uses LLMs to extract clinically essential key phrases from radiology reports and retrieves relevant phrases given an input image. By conditioning LLMs on the retrieved phrases, RA-RRG effectively suppresses hallucinations while maintaining strong report generation performance. Experiments on the MIMIC-CXR and IU X-ray datasets show state-of-the-art results on CheXbert metrics and competitive RadGraph F1 scores compared to MLLMs. Furthermore, RA-RRG naturally generalizes to multi-view RRG by aggregating phrases retrieved from multiple images, highlighting its broad applicability to real-world clinical scenarios. Code is available at https://github.com/deepnoid-ai/RA-RRG.

Jonggwon Park, Byungmu Yoon, Soobum Kim, Kyoyun Choi• 2025

Related benchmarks

Task	Dataset	Result
Radiology Report Generation	MIMIC-CXR (test)	BLEU-47	235
Radiology Report Generation	MIMIC-CXR FINDINGS section (test)	ROUGE-L24.9	11
Radiology Report Generation	IU X-Ray (test)	ROUGE-L27.2	9

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord