RadLing: Towards Efficient Radiology Report Understanding
About
Most natural language tasks in the radiology domain use language models pre-trained on biomedical corpus. There are few pretrained language models trained specifically for radiology, and fewer still that have been trained in a low data setting and gone on to produce comparable results in fine-tuning tasks. We present RadLing, a continuously pretrained language model using Electra-small (Clark et al., 2020) architecture, trained using over 500K radiology reports, that can compete with state-of-the-art results for fine tuning tasks in radiology domain. Our main contribution in this paper is knowledge-aware masking which is a taxonomic knowledge-assisted pretraining task that dynamically masks tokens to inject knowledge during pretraining. In addition, we also introduce an knowledge base-aided vocabulary extension to adapt the general tokenization vocabulary to radiology domain.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Named Entity Recognition | RadGraph (MIMIC) | Macro F192 | 5 | |
| Named Entity Recognition | RadGraph (CheXpert) | Macro F192 | 5 | |
| Relationship Extraction | RadGraph (MIMIC) | Macro F198 | 5 | |
| Relationship Extraction | RadGraph (CheXpert) | Macro F10.94 | 5 | |
| Abnormal classification | Demner-Fushman dataset | Macro F199 | 5 | |
| Radiology Question Answering | RadQA | F1 Score62.55 | 5 |