Unsafe Prompt Detection

Benchmarks

Dataset Name	SOTA Method	Metric
ToxicChat (test)	OpenAI Moderation API	Precision0.815	16	4mo ago
XSTest (test)	OpenAI Moderation API	Precision87.8	7	4mo ago
XSTest	GradSafe-Zero	AUPRC93.6	4	4mo ago
ToxicChat	GradSafe-Zero	AUPRC75.5	4	4mo ago

Showing 4 of 4 rows