Improving Chest X-Ray Report Generation by Leveraging Warm Starting
About
Automatically generating a report from a patient's Chest X-Rays (CXRs) is a promising solution to reducing clinical workload and improving patient care. However, current CXR report generators -- which are predominantly encoder-to-decoder models -- lack the diagnostic accuracy to be deployed in a clinical setting. To improve CXR report generation, we investigate warm starting the encoder and decoder with recent open-source computer vision and natural language processing checkpoints, such as the Vision Transformer (ViT) and PubMedBERT. To this end, each checkpoint is evaluated on the MIMIC-CXR and IU X-Ray datasets. Our experimental investigation demonstrates that the Convolutional vision Transformer (CvT) ImageNet-21K and the Distilled Generative Pre-trained Transformer 2 (DistilGPT2) checkpoints are best for warm starting the encoder and decoder, respectively. Compared to the state-of-the-art ($\mathcal{M}^2$ Transformer Progressive), CvT2DistilGPT2 attained an improvement of 8.3\% for CE F-1, 1.8\% for BLEU-4, 1.6\% for ROUGE-L, and 1.0\% for METEOR. The reports generated by CvT2DistilGPT2 have a higher similarity to radiologist reports than previous approaches. This indicates that leveraging warm starting improves CXR report generation. Code and checkpoints for CvT2DistilGPT2 are available at https://github.com/aehrc/cvt2distilgpt2.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Radiology Report Generation | MIMIC-CXR (test) | BLEU-40.124 | 172 | |
| Radiology Report Generation | CheXpert Plus (test) | Precision0.285 | 88 | |
| Radiology Report Generation | IU-Xray (test) | ROUGE-L0.277 | 77 | |
| Radiology Report Generation | MIMIC-CXR | ROUGE-L28.5 | 57 | |
| Medical Report Generation | MIMIC-CXR | BLEU-40.127 | 43 | |
| Radiology Report Generation | CHEXPERT Plus | R-L0.238 | 37 | |
| Medical Report Generation | MIMIC-CXR 2.0.0 (test) | BL-40.127 | 21 | |
| Medical Report Generation | IU X-Ray | BLEU-10.473 | 21 | |
| Medical Report Generation | Tuberculosis dataset | F1 Score21.28 | 13 | |
| Medical Report Generation | Ophthalmology dataset (test) | BLEU-10.6249 | 11 |