Share your thoughts, 1 month free Claude Pro on usSee more

Bayesian Assessment of Sycophancy on BASIL User belief setting 1.0 (test)

0.156Bayesian Error (RMSE)

gpt-4o-mini

Updated 3mo ago

Evaluation Results

Method	Links
gpt-4o-mini 2025.08		0.156
gpt-4o-mini 2025.08		0.184
llama-3.2:1b 2025.08		0.219
claude-haiku-4-5 2025.08		0.244
phi-4:14b 2025.08		0.246
gpt-4o-mini 2025.08		0.258
phi-4:14b 2025.08		0.273
claude-haiku-4-5 2025.08		0.273
llama-3.2:1b 2025.08		0.316
llama-3.2:3b 2025.08		0.32
llama-3.2:3b 2025.08		0.33
llama-3.2:3b 2025.08		0.339
mistral:7b 2025.08		0.341
llama-3.2:1b 2025.08		0.366
mistral:7b 2025.08		0.422
phi-4:14b 2025.08		0.425
claude-haiku-4-5 2025.08		0.476
mistral:7b 2025.08		0.477