BASIL: Bayesian Assessment of Sycophancy in LLMs

About

Sycophancy (overly agreeable or flattering behavior) poses a fundamental challenge for human-AI collaboration, particularly in high-stakes decision-making domains such as health, law, and education. A central difficulty in studying sycophancy in large language models (LLMs) is disentangling sycophantic belief shifts from rational changes in behavior driven by new evidence or user-provided information. Existing approaches either measure descriptive behavior changes or apply normative evaluations that rely on objective ground truth, limiting their applicability to subjective or uncertain tasks. We introduce a Bayesian probabilistic framework, grounded in behavioral economics and rational decision theory, that explicitly separates sycophancy from rational belief updating. Within this framework, we achieve three objectives: (i) a descriptive metric that measures sycophancy while controlling for rational responses to evidence; (ii) a normative metric that quantifies how sycophancy leads models astray from Bayesian-consistent belief updating; and (iii) the ability to apply both metrics in settings without ground-truth labels. Applying our framework across multiple LLMs and three uncertainty-driven tasks, we find robust evidence of sycophantic belief shifts and show that their impact on rationality depends on whether models systematically over- or under-update their beliefs. Finally, we demonstrate that a post-hoc calibration method and two fine-tuning strategies (SFT and DPO) substantially reduce Bayesian inconsistency, with particularly strong improvements under explicit sycophancy prompting.

Katherine Atwell, Pedram Heydari, Anthony Sicilia, Malihe Alikhani• 2025

Related benchmarks

Task	Dataset	Result
Sycophancy Assessment	BASIL 1.0 (All)	Change in Bayesian Error (RMSE)-0.046	32
Sycophancy Assessment	BASIL 1.0 (Under-Update)	Change in Bayesian Error (RMSE)-0.234	32
Sycophancy Assessment	BASIL Over-Update 1.0	Change in Bayesian Error (RMSE)0.061	32
Sycophancy Evaluation	Sycophancy Evaluation Dataset	Total Sycophancy Score0.341	32
Bayesian Assessment of Sycophancy	BASIL Abstract setting 1.0 (test)	--	18
Bayesian Assessment of Sycophancy	BASIL Third-p. belief setting 1.0 (test)	--	18
Bayesian Assessment of Sycophancy	BASIL User belief setting 1.0 (test)	--	18

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord