Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Sycophancy Assessment on BASIL 1.0 (Under-Update)

-0.355Change in Bayesian Error (RMSE)

mistral:7b

-0.36792-0.28071-0.1935-0.10629Aug 23, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.08
-0.355
2025.08
-0.329
2025.08
-0.271
2025.08
-0.234
2025.08
-0.23
2025.08
-0.2
2025.08
-0.188
2025.08
-0.171
2025.08
-0.168
2025.08
-0.152
2025.08
-0.151
2025.08
-0.146
2025.08
-0.137
2025.08
-0.137
2025.08
-0.136
2025.08
-0.132
2025.08
-0.129
2025.08
-0.129
2025.08
-0.116
2025.08
-0.115
2025.08
-0.107
2025.08
-0.104
2025.08
-0.103
2025.08
-0.091
2025.08
-0.09
2025.08
-0.087
2025.08
-0.082
2025.08
-0.08
2025.08
-0.079
2025.08
-0.048
2025.08
-0.041
2025.08
-0.032