Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Discriminatory Behaviour Detection on PKU-SafeRLHF

96Accuracy

Dual-agent

91.293.69698.4Dec 1, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.12
9695.1979698.20.9290.938