Advancing Radiograph Representation Learning with Masked Record Modeling
About
Modern studies in radiograph representation learning rely on either self-supervision to encode invariant semantics or associated radiology reports to incorporate medical expertise, while the complementarity between them is barely noticed. To explore this, we formulate the self- and report-completion as two complementary objectives and present a unified framework based on masked record modeling (MRM). In practice, MRM reconstructs masked image patches and masked report tokens following a multi-task scheme to learn knowledge-enhanced semantic representations. With MRM pre-training, we obtain pre-trained models that can be well transferred to various radiography tasks. Specifically, we find that MRM offers superior performance in label-efficient fine-tuning. For instance, MRM achieves 88.5% mean AUC on CheXpert using 1% labeled data, outperforming previous R$^2$L methods with 100% labels. On NIH ChestX-ray, MRM outperforms the best performing counterpart by about 3% under small labeling ratios. Besides, MRM surpasses self- and report-supervised pre-training in identifying the pneumonia type and the pneumothorax area, sometimes by large margins.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-Label Classification | ChestX-Ray14 (test) | AUROC (%)79.4 | 88 | |
| Medical Image Classification | RSNA | AUC93.3 | 36 | |
| Medical Image Classification | Covidx | Accuracy90.8 | 36 | |
| Medical Image Classification | CheXpert | AUC88.7 | 36 | |
| Classification | RSNA | Accuracy78.77 | 29 | |
| Classification | CheXpert 5x200 1.0 | Accuracy58.26 | 27 | |
| Classification | Rad-ChestCT | AUC72.6 | 25 | |
| Classification | CC-CCII | Accuracy90.3 | 24 | |
| Classification | CT-RATE | AUC0.821 | 24 | |
| Medical Image Classification | MIDRC-XR Portable | AUC96.52 | 18 |