Share your thoughts, 1 month free Claude Pro on usSee more

Unsafe Prompt Detection on ToxicChat

75.5AUPRC

GradSafe-Zero

Updated 4mo ago

Evaluation Results

Method	Links
GradSafe-Zero 2024.02		75.5
Llama Guard 2024.02		63.5
OpenAI Moderation API 2024.02		60.4
Perspective API 2024.02		48.7