Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Alignment Evaluation on Sycophancy
Loading...
51
Mean Improvement
iPASa
29.16
34.83
40.5
46.17
Sep 25, 2025
Mean Improvement
95% CI
P-Value
Updated 16d ago
Evaluation Results
Method
Method
Links
Mean Improvement
95% CI
P-Value
iPASa
Model=Nous-Hermes-2-Mi...
2025.09
51
50
0
PASf
Model=Nous-Hermes-2-Mi...
2025.09
51
50
0
iPASa
Model=Llama-3.1-8B-Ins...
2025.09
43
41
0
PASf
Model=Llama-3.1-8B-Ins...
2025.09
43
42
0
iPASa
Model=DeepSeek-R1-Dist...
2025.09
33
29
0
iPASwo
Model=DeepSeek-R1-Dist...
2025.09
33
29
0
PASf
Model=DeepSeek-R1-Dist...
2025.09
33
29
0
iPASwo
Model=Llama-3.1-8B-Ins...
2025.09
33
28
0
iPASwo
Model=Nous-Hermes-2-Mi...
2025.09
30
21
0
Feedback
Search any
task
Search any
task