| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Sycophancy Evaluation Opinion | Mistral-7B | PSS8.14 | 14 | 11d ago | |
| Sycophancy Evaluation Factual | Llama-3 | PSS0.1124 | 14 | 11d ago | |
| PHIL | Supervised Pinpoint Tuning | Sycophancy Preference99.34 | 10 | 1mo ago | |
| POLI | Ours Resid | Sycophantic Preference (%)92.18 | 10 | 1mo ago | |
| NLP | Synthetic Data Intervention | Sycophancy Preference49.25 | 10 | 1mo ago | |
| Open-Ended Sycophancy | Synthetic Data Intervention | Syc Score48.15 | 10 | 1mo ago | |
| Syco-Bench | Pickside Score1.21 | 10 | 1mo ago | ||
| SycophancyEval | Lag-DPO | Sycophancy Rate54.2 | 9 | 17d ago | |
| DebateQA | S (PD, L)0.481 | 6 | 13d ago | ||
| AITA | Sycophancy Score (S) PD-L0.54 | 6 | 13d ago | ||
| TruthfulQA (adversarial) | Silicon Mirror | Sycophantic Response Count1 | 3 | 17d ago | |
| TruthfulQA Adversarial n=50 | Gemini 2.5 Flash (Static Guardrails) | Sycophantic Responses Count2 | 3 | 17d ago | |
| Offline Evaluation Set | gpt-5-thinking | Sycophancy Prevalence Score4 | 3 | 1mo ago | |
| Early A/B tests Online prevalence | gpt-5-main | Prevalence Change (Free Users)-0.69 | 1 | 1mo ago |