Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Insulting Behavior Detection on PKU-SafeRLHF

78Accuracy

Single agent

74.176.057879.95Dec 1, 2025
Updated 3mo ago

Evaluation Results

MethodLinks
2025.12
7871.99280.777.80.5410.551