A Vision-Language Foundation Model to Enhance Efficiency of Chest X-ray Interpretation
About
Over 1.4 billion chest X-rays (CXRs) are performed annually due to their cost-effectiveness as an initial diagnostic test. This scale of radiological studies provides a significant opportunity to streamline CXR interpretation and documentation. While foundation models are a promising solution, the lack of publicly available large-scale datasets and benchmarks inhibits their iterative development and real-world evaluation. To overcome these challenges, we constructed a large-scale dataset (CheXinstruct), which we utilized to train a vision-language foundation model (CheXagent). We systematically demonstrated competitive performance across eight distinct task types on our novel evaluation benchmark (CheXbench). Beyond technical validation, we assessed the real-world utility of CheXagent in directly drafting radiology reports. Our clinical assessment with eight radiologists revealed a 36% time saving for residents using CheXagent-drafted reports, while attending radiologists showed no significant time difference editing resident-drafted or CheXagent-drafted reports. The CheXagent-drafted reports improved the writing efficiency of both radiology residents and attending radiologists in 81% and 61% of cases, respectively, without loss of quality. Overall, we demonstrate that CheXagent can effectively perform a variety of CXR interpretation tasks and holds potential to assist radiologists in routine clinical workflows.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Radiology Report Generation | MIMIC-CXR (test) | BLEU-40.047 | 121 | |
| Image Classification | Covidx | Accuracy34.3 | 57 | |
| Visual Question Answering | Chest X-ray VQA (test) | Overall Accuracy47.41 | 43 | |
| Medical Report Generation | MIMIC-CXR | F1 Score31.95 | 22 | |
| Chest X-ray Report Generation | MIMIC-CXR (test) | F1 Macro (14)38.9 | 21 | |
| Medical Image Report Labeling | MIMIC-CXR (test) | Macro F1 (14 Labels)38.9 | 21 | |
| Radiology Report Generation | RadVLM MIMIC-CXR (test) | ROUGE-L22.5 | 13 | |
| Medical Report Generation | IU X-Ray | Precision50.37 | 11 | |
| Abnormality Detection | CXR | IoU31 | 8 | |
| Close-Ended Visual Question Answering | CXR | BERTScore90 | 8 |