Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Bayesian Assessment of Sycophancy on BASIL Third-p. belief setting 1.0 (test)

0.16Bayesian Error (RMSE)

gpt-4o-mini

0.146480.237740.3290.42026Aug 23, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.08
0.16
2025.08
0.189
2025.08
0.23
2025.08
0.258
2025.08
0.259
2025.08
0.259
2025.08
0.271
2025.08
0.292
2025.08
0.302
2025.08
0.309
2025.08
0.312
2025.08
0.327
2025.08
0.358
2025.08
0.386
2025.08
0.443
2025.08
0.449
2025.08
0.467
2025.08
0.498