MedBayes-Lite: A Clinical Uncertainty Governance Layer for Risk-Aware Medical Decision Support

About

Clinical language models often assign high confidence to incorrect predictions, particularly in high-severity and out-of-distribution cases. We present MedBayes-Lite, a retraining-free uncertainty governance layer for transformer-based clinical predictors. It combines Monte Carlo dropout, predictive calibration, and confidence-guided abstention to defer low-confidence predictions for human review, adding no trainable parameters. Evaluated on MedMCQA and MedQA-USMLE, MedBayes-Lite reduces expected calibration error by 0.23 to 0.33 and drives harmful overconfident errors (confident, incorrect, high-severity predictions) toward zero. Under domain shift from MedMCQA to MedQA-USMLE, it reduces confident high-severity errors from about 21% to near zero while roughly halving calibration drift. We also introduce the Clinical Uncertainty Score (CUS), which strongly correlates with harmful overconfidence (r approximately 0.88). Although the framework does not improve risk-coverage ranking, and temperature scaling or deep ensembles may provide advantages in calibration cost or risk ranking, MedBayes-Lite offers a practical calibration-and-abstention layer that reduces confident high-severity errors in clinical question-answering benchmarks.

Elias Hossain, Md Mehedi Hasan Nipu, Maleeha Sheikh, Tasfia Nuzhat, Rajib Rana, Subash Neupane, Bj\"orn W. Schuller, Niloofar Yousefi• 2025

Related benchmarks

Task	Dataset	Result
Medical Question Answering	PubMedQA (test)	CUS Score7.11	4
Biomedical Question Answering	PubMedQA (test)	CUS0.254	2
Clinical Text Analysis	MIMIC-III	CUS0.38	2
Medical Question Answering	MedQA	CUS31	2
Medical Question Answering	PubMedQA	CUS Score66	2
Biomedical Question Answering	MedQA (test)	CUS28.9	2
Medical Question Answering	MIMIC-III (test)	CUS Score23.12	2

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord