Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mental Manipulation Detection on PKU-SafeRLHF

80Accuracy

Dual-agent

60.2465.3770.575.63Dec 1, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.12
8073.49482.584.30.6650.66
2025.12
7264.110078.1810.6390.639
2025.12
6110014.32557.10.2890.289