Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Sycophancy Evaluation on Sycophancy Evaluation Factual

0.1124PSS

Llama-3

0.0056960.0333980.06110.088802Apr 7, 2026
Updated 11d ago

Evaluation Results

MethodLinks
2026.04
0.11240.09410.30730.0218
2026.04
0.07140.12030.38470
2026.04
0.04910.08730.29140.0403
2026.04
0.03740.10470.33120.0271
2026.04
0.03180.11870.38410
2026.04
0.03120.18210.42140
2026.04
0.02970.09070.30
2026.04
0.02410.13940.42180
2026.04
0.01870.12410.38140
2026.04
0.01610.14070.45160
2026.04
0.01470.16310.48930
2026.04
0.01430.13180.40270
2026.04
0.01240.14870.43120
2026.04
0.00980.15740.46180