Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Bayesian Assessment of Sycophancy on BASIL User belief setting 1.0 (test)
Loading...
0.156
Bayesian Error (RMSE)
gpt-4o-mini
0.14316
0.22983
0.3165
0.40317
Aug 23, 2025
Bayesian Error (RMSE)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Bayesian Error (RMSE)
gpt-4o-mini
Confidence Elicitation...
2025.08
0.156
gpt-4o-mini
Confidence Elicitation...
2025.08
0.184
llama-3.2:1b
Confidence Elicitation...
2025.08
0.219
claude-haiku-4-5
Confidence Elicitation...
2025.08
0.244
phi-4:14b
Confidence Elicitation...
2025.08
0.246
gpt-4o-mini
Confidence Elicitation...
2025.08
0.258
phi-4:14b
Confidence Elicitation...
2025.08
0.273
claude-haiku-4-5
Confidence Elicitation...
2025.08
0.273
llama-3.2:1b
Confidence Elicitation...
2025.08
0.316
llama-3.2:3b
Confidence Elicitation...
2025.08
0.32
llama-3.2:3b
Confidence Elicitation...
2025.08
0.33
llama-3.2:3b
Confidence Elicitation...
2025.08
0.339
mistral:7b
Confidence Elicitation...
2025.08
0.341
llama-3.2:1b
Confidence Elicitation...
2025.08
0.366
mistral:7b
Confidence Elicitation...
2025.08
0.422
phi-4:14b
Confidence Elicitation...
2025.08
0.425
claude-haiku-4-5
Confidence Elicitation...
2025.08
0.476
mistral:7b
Confidence Elicitation...
2025.08
0.477
Feedback
Search any
task
Search any
task