Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Insulting Behavior Detection on PKU-SafeRLHF

78Accuracy

Single agent

74.176.057879.95Dec 1, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.12
7871.99280.777.80.5410.551