Share your thoughts, 1 month free Claude Pro on usSee more

Mental Manipulation Detection on PKU-SafeRLHF

80Accuracy

Dual-agent

Updated 5mo ago

Evaluation Results

Method	Links
Dual-agent 2025.12		80	73.4	94	82.5	84.3	0.665	0.66
MV (N = 40) 2025.12		72	64.1	100	78.1	81	0.639	0.639
Rule-based 2025.12		61	100	14.3	25	57.1	0.289	0.289