Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Sycophancy benchmark

Benchmarks

Task NameDataset NameSOTA ResultTrend
Sycophancy DetectionSycophancy benchmark (full evaluation set)
AUROC0.732
12
Showing 1 of 1 rows