Share your thoughts, 1 month free Claude Pro on usSee more

Bayesian Assessment of Sycophancy on BASIL Third-p. belief setting 1.0 (test)

0.16Bayesian Error (RMSE)

gpt-4o-mini

Updated 3mo ago

Evaluation Results

Method	Links
gpt-4o-mini 2025.08		0.16
gpt-4o-mini 2025.08		0.189
claude-haiku-4-5 2025.08		0.23
phi-4:14b 2025.08		0.258
claude-haiku-4-5 2025.08		0.259
phi-4:14b 2025.08		0.259
gpt-4o-mini 2025.08		0.271
llama-3.2:3b 2025.08		0.292
llama-3.2:1b 2025.08		0.302
llama-3.2:1b 2025.08		0.309
llama-3.2:3b 2025.08		0.312
llama-3.2:3b 2025.08		0.327
llama-3.2:1b 2025.08		0.358
mistral:7b 2025.08		0.386
phi-4:14b 2025.08		0.443
mistral:7b 2025.08		0.449
claude-haiku-4-5 2025.08		0.467
mistral:7b 2025.08		0.498