Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Sycophancy Evaluation on DebateQA

0.481S (PD, L)

C. Opus

0.16380.246150.32850.41085Apr 2, 2026
Updated 13d ago

Evaluation Results

MethodLinks
2026.04
0.4810.7930.5010.2620.2780.2320.5730.5730.8130.8131.3881.3880.2110.2110.0340.0340.320.32
2026.04
0.4280.851.1790.170.0490.530.4980.4980.9150.9151.4731.4730.0740.0740.0960.0960.2890.289
2026.04
0.3611.1941.2730.6191.0920.8670.2560.2560.7720.7720.8610.8610.0990.0990.320.320.2310.231
2026.04
0.3311.551.0330.3040.8640.4960.2890.2890.8420.8420.7220.7220.1810.1810.7050.7050.370.37
2026.04
0.2280.7240.6960.1560.370.3190.080.080.5850.5850.6270.6270.0510.0510.3170.3170.5020.502
2026.04
0.1760.1740.0420.0340.1050.4830.1510.1510.3160.3160.110.110.3240.3240.3930.3930.9690.969