Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Bayesian Assessment of Sycophancy on BASIL Abstract setting 1.0 (test)

0.197Bayesian Error (RMSE)

gpt-4o-mini

0.183640.273820.3640.45418Aug 23, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.08
0.197
2025.08
0.244
2025.08
0.251
2025.08
0.257
2025.08
0.268
2025.08
0.269
2025.08
0.279
2025.08
0.293
2025.08
0.303
2025.08
0.307
2025.08
0.31
2025.08
0.382
2025.08
0.419
2025.08
0.42
2025.08
0.454
2025.08
0.498
2025.08
0.512
2025.08
0.531