Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Sycophancy Evaluation on NLP
Loading...
49.25
Sycophancy Preference
Synthetic Data Intervention
47.2724
60.6212
73.97
87.3188
Jan 26, 2026
Sycophancy Preference
Updated 1mo ago
Evaluation Results
Method
Method
Links
Sycophancy Preference
Synthetic Data Intervention
Base Model=Gemma-2-2B
2026.01
49.25
Ours SAE
Base Model=Gemma-2-2B,...
2026.01
50
Ours Resid
Base Model=Gemma-2-2B,...
2026.01
50.32
Ours Resid
Model=Gemma-2-9B, Prob...
2026.01
79.88
Ours SAE
Model=Gemma-2-9B, Prob...
2026.01
83.36
Supervised Pinpoint Tuning
Base Model=Gemma-2-2B
2026.01
89.81
Untrained Gemma-2-2B
Base Model=Gemma-2-2B
2026.01
91.26
Untrained Gemma-2-9B
Model=Gemma-2-9B
2026.01
98.59
Synthetic Data Intervention
Model=Gemma-2-9B
2026.01
98.6
Supervised Pinpoint Tuning
Model=Gemma-2-9B
2026.01
98.69
Feedback
Search any
task
Search any
task