Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mind the Ambiguity: Aleatoric Uncertainty Quantification in LLMs for Safe Medical Question Answering

About

The deployment of Large Language Models in Medical Question Answering is severely hampered by ambiguous user queries, a significant safety risk that demonstrably reduces answer accuracy in high-stakes healthcare settings. In this paper, we formalize this challenge by linking input ambiguity to aleatoric uncertainty (AU), which is the irreducible uncertainty arising from underspecified input. To facilitate research in this direction, we construct CV-MedBench, the first benchmark designed for studying input ambiguity in Medical QA. Using this benchmark, we analyze AU from a representation engineering perspective, revealing that AU is linearly encoded in LLM's internal activation patterns. Leveraging this insight, we introduce a novel AU-guided "Clarify-Before-Answer" framework, which incorporates AU-Probe - a lightweight module that detects input ambiguity directly from hidden states. Unlike existing uncertainty estimation methods, AU-Probe requires neither LLM fine-tuning nor multiple forward passes, enabling an efficient mechanism to proactively request user clarification and significantly enhance safety. Extensive experiments across four open LLMs demonstrate the effectiveness of our QA framework, with an average accuracy improvement of 9.48% over baselines. Our framework provides an efficient and robust solution for safe Medical QA, strengthening the reliability of health-related applications. The code is available at https://github.com/yaokunliu/AU-Med.git, and the CV-MedBench dataset is released on Hugging Face at https://huggingface.co/datasets/yaokunl/CV-MedBench.

Yaokun Liu, Yifan Liu, Phoebe Mbuvi, Zelin Li, Ruichen Yao, Gawon Lim, Dong Wang• 2026

Related benchmarks

TaskDatasetResultRank
Medical Question AnsweringMedMCQA (test)
Accuracy73.9
134
Medical Question AnsweringCV-MedQA (test)
AUROC0.9998
28
Medical Question AnsweringCV-MedMCQA (test)
AUROC0.9999
28
Medical Question AnsweringCV-MedExQA (test)
AUROC0.9987
28
Medical Question AnsweringCV-MedQA ambiguous (test)
Accuracy0.7643
12
Medical Question AnsweringCV-MedExQA ambiguous (test)
Accuracy73.09
12
Showing 6 of 6 rows

Other info

Follow for update