Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Sycophancy Mitigation on Sycophancy
Loading...
36.2
Sycophancy
SFT (NO RLHF)
34.752
44.526
54.3
64.074
Feb 2, 2026
Sycophancy
Helpfulness
Updated 4d ago
Evaluation Results
Method
Method
Links
Sycophancy
Helpfulness
SFT (NO RLHF)
RLHF=None
2026.02
36.2
41.3
ARA
Framework=Auditor-gate...
2026.02
38.4
77.2
INFORM
Category=Regularizatio...
2026.02
47.2
62.1
ODIN
Category=Regularizatio...
2026.02
51.6
63.4
RM ENSEMBLE
Category=Reward Model...
2026.02
52.8
64.6
FILTERING (R > mu + 2sigma)
Category=Training Inte...
2026.02
55.1
50.3
PPO W/KL
Category=Regularizatio...
2026.02
58.3
68.2
PPO
Condition=Unmitigated
2026.02
72.4
76.8
Feedback
Search any
task
Search any
task