Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Malicious Prompt Detection on Combined All Datasets (test)
Loading...
4.5
ASR
ToxicDetector
1.156
23.728
46.3
68.872
Feb 8, 2026
ASR
FPR
F1 Score
Updated 4d ago
Evaluation Results
Method
Method
Links
ASR
FPR
F1 Score
ToxicDetector
Params=300M + 7B, Fine...
2026.02
4.5
32.6
84.7
BAGEL
Params=86M per finetun...
2026.02
9.5
6.6
92.2
Perspective API
Params=Not Known, Fine...
2026.02
56.9
6.8
64.2
LastLayer
Params=Not Applicable,...
2026.02
59.8
17.1
51.9
ShieldGemma
Params=2B, Fine-tunabl...
2026.02
62.4
3.8
53.4
OpenAIModeration API
Params=Not Known, Fine...
2026.02
88.1
2.4
20.8
Feedback
Search any
task
Search any
task