Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BASIL: Bayesian Assessment of Sycophancy in LLMs

About

Sycophancy (overly agreeable or flattering behavior) poses a fundamental challenge for human-AI collaboration, particularly in high-stakes decision-making domains such as health, law, and education. A central difficulty in studying sycophancy in large language models (LLMs) is disentangling sycophantic belief shifts from rational changes in behavior driven by new evidence or user-provided information. Existing approaches either measure descriptive behavior changes or apply normative evaluations that rely on objective ground truth, limiting their applicability to subjective or uncertain tasks. We introduce a Bayesian probabilistic framework, grounded in behavioral economics and rational decision theory, that explicitly separates sycophancy from rational belief updating. Within this framework, we achieve three objectives: (i) a descriptive metric that measures sycophancy while controlling for rational responses to evidence; (ii) a normative metric that quantifies how sycophancy leads models astray from Bayesian-consistent belief updating; and (iii) the ability to apply both metrics in settings without ground-truth labels. Applying our framework across multiple LLMs and three uncertainty-driven tasks, we find robust evidence of sycophantic belief shifts and show that their impact on rationality depends on whether models systematically over- or under-update their beliefs. Finally, we demonstrate that a post-hoc calibration method and two fine-tuning strategies (SFT and DPO) substantially reduce Bayesian inconsistency, with particularly strong improvements under explicit sycophancy prompting.

Katherine Atwell, Pedram Heydari, Anthony Sicilia, Malihe Alikhani• 2025

Related benchmarks

TaskDatasetResultRank
Sycophancy AssessmentBASIL 1.0 (All)
Change in Bayesian Error (RMSE)-0.046
32
Sycophancy AssessmentBASIL 1.0 (Under-Update)
Change in Bayesian Error (RMSE)-0.234
32
Sycophancy AssessmentBASIL Over-Update 1.0
Change in Bayesian Error (RMSE)0.061
32
Sycophancy EvaluationSycophancy Evaluation Dataset
Total Sycophancy Score0.341
32
Bayesian Assessment of SycophancyBASIL Abstract setting 1.0 (test)--
18
Bayesian Assessment of SycophancyBASIL Third-p. belief setting 1.0 (test)--
18
Bayesian Assessment of SycophancyBASIL User belief setting 1.0 (test)--
18
Showing 7 of 7 rows

Other info

Follow for update