Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Malicious Prompt Detection on In-the-wild Jailbreak Prompts
Loading...
98.15
Accuracy
Enhanced Filtering and Summarization System
-1.9292
24.0529
50.035
76.0171
May 2, 2025
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Enhanced Filtering and Summarization System
Number of Prompts=1405
2025.05
98.15
Logistic Regression
Number of Prompts=1405
2025.05
93.59
Toxic-BERT
Number of Prompts=1405
2025.05
6.62
Hate Speech Detector
Number of Prompts=1405
2025.05
1.92
Feedback
Search any
task
Search any
task