Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Interactive Social Privacy and Theory of Mind on Social Intelligence in Adversarial Dialogue (test)

26.7Fooling % (Hard)

GPT-5.4

5.3810.91516.4521.985Apr 13, 2026
Updated 5d ago

Evaluation Results

MethodLinks
2026.04
26.751.149.849.93.53
2026.04
16.745.344.749.94.2
2026.04
15.624.72245.64.99
2026.04
6.238.740.746.24.15