Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Sycophancy benchmark

Benchmarks

Task NameDataset NameSOTA ResultTrend
Sycophancy DetectionSycophancy benchmark (full evaluation set)
AUROC0.732
12
Showing 1 of 1 rows