Share your thoughts, 1 month free Claude Pro on usSee more

Malicious Prompt Detection on Combined All Datasets (test)

4.5ASR

ToxicDetector

Updated 4mo ago

Evaluation Results

Method	Links
ToxicDetector 2026.02		4.5	32.6	84.7
BAGEL 2026.02		9.5	6.6	92.2
Perspective API 2026.02		56.9	6.8	64.2
LastLayer 2026.02		59.8	17.1	51.9
ShieldGemma 2026.02		62.4	3.8	53.4
OpenAIModeration API 2026.02		88.1	2.4	20.8