Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Sycophancy Evaluation

Benchmarks

Task NameDataset NameSOTA ResultTrend
Sycophancy EvaluationSycophancy Evaluation Dataset
Total Sycophancy Score0.123
32
Sycophancy EvaluationSycophancy Evaluation Opinion
PSS8.14
14
Sycophancy EvaluationSycophancy Evaluation Factual
PSS0.1124
14
Sycophancy EvaluationSycophancy Evaluation
BRR13.3
9
Showing 4 of 4 rows