Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Malicious Prompt Detection benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Malicious Prompt Detection
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
JailbreakV_28K Text-based (test)
VLMShield
FNR
0
16
9d ago
JailbreakV_28K Image-based (test)
VLMShield
FNR
0.19
16
9d ago
MMBench OOD
VLMShield
FPR
0.16
14
9d ago
MM-Vet OOD
VLMShield
FPR
3.67
14
9d ago
CC3M (IOD)
VLMShield
FPR
0
14
9d ago
GPT4V-Caption (IOD)
VLMShield
FPR
0
14
9d ago
Combined All Datasets (test)
ToxicDetector
ASR
4.5
6
1mo ago
Weighted Average Across All Datasets
Enhanced Filtering and Summarization System
Accuracy
98.71
4
1mo ago
ahsanayub/malicious-prompts
Enhanced Filtering and Summarization System
Accuracy
98.72
4
1mo ago
codesagar/malicious-llm-prompts v3
Enhanced Filtering and Summarization System
Accuracy (%)
87.89
4
1mo ago
In-the-wild Jailbreak Prompts
Enhanced Filtering and Summarization System
Accuracy
98.15
4
1mo ago
LLM-LAT/harmful-dataset
Enhanced Filtering and Summarization System
Accuracy
92.1
4
1mo ago
Babelscape ALERT
Enhanced Filtering and Summarization System
Accuracy
99.73
4
1mo ago
Llama-2 Prompt with Random Search 7B-Chat
JoPA
Detection Accuracy
91
1
1mo ago
GCG attacks on Llama-2 7B-Chat
JoPA
Detection Accuracy
100
1
1mo ago
Showing 15 of 15 rows
25 / page
50 / page
100 / page
1
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs