Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Sycophancy Assessment on BASIL 1.0 (All)

-0.096Change in Bayesian Error (RMSE)

gpt-4o-mini

-0.10248-0.05874-0.0150.02874Aug 23, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.08
-0.096
2025.08
-0.057
2025.08
-0.046
2025.08
-0.042
2025.08
-0.034
2025.08
-0.032
2025.08
-0.031
2025.08
-0.025
2025.08
-0.025
2025.08
-0.023
2025.08
-0.023
2025.08
-0.023
2025.08
-0.02
2025.08
-0.015
2025.08
-0.012
2025.08
-0.011
2025.08
-0.003
2025.08
0
2025.08
0
2025.08
0.001
2025.08
0.004
2025.08
0.004
2025.08
0.004
2025.08
0.006
2025.08
0.009
2025.08
0.016
2025.08
0.018
2025.08
0.028
2025.08
0.028
2025.08
0.037
2025.08
0.059
2025.08
0.066