Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Do LLMs Act Like Rational Agents? Measuring Belief Coherence in Probabilistic Decision Making

About

Large language models (LLMs) are increasingly deployed as agents in high-stakes domains where optimal actions depend on both uncertainty about the world and consideration of utilities of different outcomes, yet their decision logic remains difficult to interpret. We study whether LLMs are rational utility maximizers with coherent beliefs and stable preferences. We consider behaviors of models for diagnosis challenge problems. The results provide insights about the relationship of LLM inferences to ideal Bayesian utility maximization for elicited probabilities and observed actions. Our approach provides falsifiable conditions under which the reported probabilities \emph{cannot} correspond to the true beliefs of any rational agent. We apply this methodology to multiple medical diagnostic domains with evaluations across several LLMs. We discuss implications of the results and directions forward for uses of LLMs in guiding high-stakes decisions.

Khurram Yamin, Jingjing Tang, Santiago Cortes-Gomez, Amit Sharma, Eric Horvitz, Bryan Wilder• 2026

Related benchmarks

TaskDatasetResultRank
Conditional-independence (belief sufficiency) testingHeart Expert-Constructed Bayesian Network--
4
Conditional-independence (belief sufficiency) testingCry--
4
Conditional-independence (belief sufficiency) testingFever Expert-Constructed Bayesian Network--
4
Conditional-independence (belief sufficiency) testingDiab (Diabetes Patient Records)--
4
Monotone Choice ProbabilitiesHeart--
4
Monotone Choice ProbabilitiesCry--
4
Monotone Choice ProbabilitiesFEVER--
4
Monotone Choice ProbabilitiesDiab--
4
Showing 8 of 8 rows

Other info

Follow for update