Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Sycophancy Evaluation on Offline Evaluation Set

4Sycophancy Prevalence Score

gpt-5-thinking

3.586.4159.2512.085Dec 19, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.12
4
2025.12
5.2
2025.12
14.5