Mind the Ambiguity: Aleatoric Uncertainty Quantification in LLMs for Safe Medical Question Answering

About

The deployment of Large Language Models in Medical Question Answering is severely hampered by ambiguous user queries, a significant safety risk that demonstrably reduces answer accuracy in high-stakes healthcare settings. In this paper, we formalize this challenge by linking input ambiguity to aleatoric uncertainty (AU), which is the irreducible uncertainty arising from underspecified input. To facilitate research in this direction, we construct CV-MedBench, the first benchmark designed for studying input ambiguity in Medical QA. Using this benchmark, we analyze AU from a representation engineering perspective, revealing that AU is linearly encoded in LLM's internal activation patterns. Leveraging this insight, we introduce a novel AU-guided "Clarify-Before-Answer" framework, which incorporates AU-Probe - a lightweight module that detects input ambiguity directly from hidden states. Unlike existing uncertainty estimation methods, AU-Probe requires neither LLM fine-tuning nor multiple forward passes, enabling an efficient mechanism to proactively request user clarification and significantly enhance safety. Extensive experiments across four open LLMs demonstrate the effectiveness of our QA framework, with an average accuracy improvement of 9.48% over baselines. Our framework provides an efficient and robust solution for safe Medical QA, strengthening the reliability of health-related applications. The code is available at https://github.com/yaokunliu/AU-Med.git, and the CV-MedBench dataset is released on Hugging Face at https://huggingface.co/datasets/yaokunl/CV-MedBench.

Yaokun Liu, Yifan Liu, Phoebe Mbuvi, Zelin Li, Ruichen Yao, Gawon Lim, Dong Wang• 2026

Related benchmarks

Task	Dataset	Result
Medical Question Answering	MedMCQA (test)	Accuracy73.9	134
Medical Question Answering	CV-MedQA (test)	AUROC0.9998	28
Medical Question Answering	CV-MedMCQA (test)	AUROC0.9999	28
Medical Question Answering	CV-MedExQA (test)	AUROC0.9987	28
Medical Question Answering	CV-MedQA ambiguous (test)	Accuracy0.7643	12
Medical Question Answering	CV-MedExQA ambiguous (test)	Accuracy73.09	12

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord