A Parameter-Efficient Transfer Learning Approach through Multitask Prompt Distillation and Decomposition for Clinical NLP
About
Existing prompt-based fine-tuning methods typically learn task-specific prompts independently, imposing significant computing and storage overhead at scale when deploying multiple clinical natural language processing (NLP) systems. We present a multitask prompt distillation and decomposition framework that learns a single shared metaprompt from 21 diverse clinical source tasks and adapts it to unseen target tasks with fewer than 0.05% trainable parameters. Evaluated across five clinical NLP task types (named entity recognition, relation extraction, question answering, natural language inference, and summarization) on 10 held-out target datasets using three backbone models (LLaMA 3.1 8B, Meditron3 8B, gpt-oss 20B), our framework consistently outperforms LoRA by 1.5~1.7% despite using orders of magnitude fewer parameters, and exceeds single-task prompt tuning by 6.1~6.6%. The gpt-oss 20B model achieves the highest overall performance, particularly on clinical reasoning tasks. The strong zero- and few-shot performance demonstrates better transferability of the shared prompt representation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Medical Question Answering | Medbullets | Accuracy68.9 | 65 | |
| Question Answering | HeadQA | Accuracy64.4 | 14 | |
| Multi-task Evaluation | Aggregated Clinical Tasks | Average Score73.9 | 12 | |
| Named Entity Recognition | n2c2 University of Washington (UW) 2022 | F1 Score87.1 | 12 | |
| Named Entity Recognition | UFHealth Opioid use dataset | F1 Score91.4 | 12 | |
| Natural Language Inference | SciNLI | F1 Score86.5 | 12 | |
| Natural Language Inference | RadNLI | F1 Score82.3 | 12 | |
| Relation Extraction | n2c2 2022 (University of Washington) | F1 Score83.5 | 12 | |
| Relation Extraction | UFHealth Opioid use dataset | F1 Score89.3 | 12 | |
| Summarization | RadNLI | ROUGE-L39.1 | 12 |