Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Bayesian Assessment of Sycophancy on BASIL Abstract setting 1.0 (test)
Loading...
0.197
Bayesian Error (RMSE)
gpt-4o-mini
0.18364
0.27382
0.364
0.45418
Aug 23, 2025
Bayesian Error (RMSE)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Bayesian Error (RMSE)
gpt-4o-mini
Confidence Elicitation...
2025.08
0.197
claude-haiku-4-5
Confidence Elicitation...
2025.08
0.244
gpt-4o-mini
Confidence Elicitation...
2025.08
0.251
phi-4:14b
Confidence Elicitation...
2025.08
0.257
phi-4:14b
Confidence Elicitation...
2025.08
0.268
claude-haiku-4-5
Confidence Elicitation...
2025.08
0.269
llama-3.2:3b
Confidence Elicitation...
2025.08
0.279
llama-3.2:3b
Confidence Elicitation...
2025.08
0.293
llama-3.2:3b
Confidence Elicitation...
2025.08
0.303
llama-3.2:1b
Confidence Elicitation...
2025.08
0.307
llama-3.2:1b
Confidence Elicitation...
2025.08
0.31
mistral:7b
Confidence Elicitation...
2025.08
0.382
llama-3.2:1b
Confidence Elicitation...
2025.08
0.419
gpt-4o-mini
Confidence Elicitation...
2025.08
0.42
mistral:7b
Confidence Elicitation...
2025.08
0.454
claude-haiku-4-5
Confidence Elicitation...
2025.08
0.498
phi-4:14b
Confidence Elicitation...
2025.08
0.512
mistral:7b
Confidence Elicitation...
2025.08
0.531
Feedback
Search any
task
Search any
task