Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Bayesian Assessment of Sycophancy on BASIL User belief setting 1.0 (test)

0.156Bayesian Error (RMSE)

gpt-4o-mini

0.143160.229830.31650.40317Aug 23, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.08
0.156
2025.08
0.184
2025.08
0.219
2025.08
0.244
2025.08
0.246
2025.08
0.258
2025.08
0.273
2025.08
0.273
2025.08
0.316
2025.08
0.32
2025.08
0.33
2025.08
0.339
2025.08
0.341
2025.08
0.366
2025.08
0.422
2025.08
0.425
2025.08
0.476
2025.08
0.477