Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Bayesian Assessment of Sycophancy on BASIL Third-p. belief setting 1.0 (test)
Loading...
0.16
Bayesian Error (RMSE)
gpt-4o-mini
0.14648
0.23774
0.329
0.42026
Aug 23, 2025
Bayesian Error (RMSE)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Bayesian Error (RMSE)
gpt-4o-mini
Confidence Elicitation...
2025.08
0.16
gpt-4o-mini
Confidence Elicitation...
2025.08
0.189
claude-haiku-4-5
Confidence Elicitation...
2025.08
0.23
phi-4:14b
Confidence Elicitation...
2025.08
0.258
claude-haiku-4-5
Confidence Elicitation...
2025.08
0.259
phi-4:14b
Confidence Elicitation...
2025.08
0.259
gpt-4o-mini
Confidence Elicitation...
2025.08
0.271
llama-3.2:3b
Confidence Elicitation...
2025.08
0.292
llama-3.2:1b
Confidence Elicitation...
2025.08
0.302
llama-3.2:1b
Confidence Elicitation...
2025.08
0.309
llama-3.2:3b
Confidence Elicitation...
2025.08
0.312
llama-3.2:3b
Confidence Elicitation...
2025.08
0.327
llama-3.2:1b
Confidence Elicitation...
2025.08
0.358
mistral:7b
Confidence Elicitation...
2025.08
0.386
phi-4:14b
Confidence Elicitation...
2025.08
0.443
mistral:7b
Confidence Elicitation...
2025.08
0.449
claude-haiku-4-5
Confidence Elicitation...
2025.08
0.467
mistral:7b
Confidence Elicitation...
2025.08
0.498
Feedback
Search any
task
Search any
task