Knowledge-enhanced Visual-Language Pre-training on Chest Radiology Images
About
While multi-modal foundation models pre-trained on large-scale data have been successful in natural language understanding and vision recognition, their use in medical domains is still limited due to the fine-grained nature of medical tasks and the high demand for domain knowledge. To address this challenge, we propose a novel approach called Knowledge-enhanced Auto Diagnosis (KAD) which leverages existing medical domain knowledge to guide vision-language pre-training using paired chest X-rays and radiology reports. We evaluate KAD on {four} external X-ray datasets and demonstrate that its zero-shot performance is not only comparable to that of fully-supervised models, but also superior to the average of three expert radiologists for three (out of five) pathologies with statistical significance. Moreover, when few-shot annotation is available, KAD outperforms all existing approaches in fine-tuning settings, demonstrating its potential for application in different clinical scenarios.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-Label Classification | ChestX-Ray14 (test) | AUROC (%)82.5 | 88 | |
| Medical Image Classification | MIDRC-XR Portable | AUC93.41 | 18 | |
| Medical Image Classification | MIDRC-XR | AUC85.74 | 18 | |
| Multi-label CXR Classification | Open-i (test) | AUC0.807 | 8 | |
| Multi-label CXR Classification | PadChest (test) | AUC0.75 | 8 | |
| Multi-label CXR Classification | PadChest20 (test) | AUC0.735 | 8 | |
| Multi-label CXR Classification | CheXpert (test) | AUC90.5 | 8 | |
| Multi-label CXR Classification | ChestXDet10 (test) | AUC0.735 | 8 | |
| Anatomy correspondence | Chest-Landmark | Anatomical Structure Matching Error296.5 | 7 | |
| Image Classification | CXR14 | AUC0.789 | 6 |