Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Sycophancy Evaluation on PHIL
Loading...
99.34
Sycophancy Preference
Supervised Pinpoint Tuning
48.1824
61.4637
74.745
88.0263
Jan 26, 2026
Sycophancy Preference
Updated 1mo ago
Evaluation Results
Method
Method
Links
Sycophancy Preference
Supervised Pinpoint Tuning
Model=Gemma-2-9B
2026.01
99.34
Synthetic Data Intervention
Model=Gemma-2-9B
2026.01
98.73
Untrained Gemma-2-9B
Model=Gemma-2-9B
2026.01
98.71
Supervised Pinpoint Tuning
Base Model=Gemma-2-2B
2026.01
90.41
Untrained Gemma-2-2B
Base Model=Gemma-2-2B
2026.01
90.35
Synthetic Data Intervention
Base Model=Gemma-2-2B
2026.01
79.65
Ours Resid
Model=Gemma-2-9B, Prob...
2026.01
69.56
Ours SAE
Model=Gemma-2-9B, Prob...
2026.01
60.81
Ours Resid
Base Model=Gemma-2-2B,...
2026.01
53.98
Ours SAE
Base Model=Gemma-2-2B,...
2026.01
50.15
Feedback
Search any
task
Search any
task