Malicious Prompt Detection

Benchmarks

Dataset Name	SOTA Method	Metric
VLGuard & MSSBench π = 0.005	VLMGuard	AUROC (VLGuard & MSSBench)0.9195	26	18d ago
VLGuard & MLLMGuard π = 0.005	VLMGuard	AUROC94.37	26	18d ago
JailBreakV & GPT4V π = 0.005	VLMGuard	AUROC98.06	26	18d ago
JailbreakV_28K Text-based (test)	VLMShield	FNR0	16	3mo ago
JailbreakV_28K Image-based (test)	VLMShield	FNR0.19	16	3mo ago
MMBench OOD	VLMShield	FPR0.16	14	3mo ago
MM-Vet OOD	VLMShield	FPR3.67	14	3mo ago
CC3M (IOD)	VLMShield	FPR0	14	3mo ago
GPT4V-Caption (IOD)	VLMShield	FPR0	14	3mo ago
Meta-instruction prompts	DE-FIVE	Language Accuracy97.1	12	1mo ago
Combined All Datasets (test)	ToxicDetector	ASR4.5	6	4mo ago
VLGuard & MSSBench	VLMGuard	AUROC97.48	5	18d ago
VLGuard & MLLMGuard	VLMGuard	AUROC96.82	5	18d ago
JailBreakV GPT4V	VLMGuard	AUROC99.05	5	18d ago
Weighted Average Across All Datasets	Enhanced Filtering and Summarization System	Accuracy98.71	4	4mo ago
ahsanayub/malicious-prompts	Enhanced Filtering and Summarization System	Accuracy98.72	4	4mo ago
codesagar/malicious-llm-prompts v3	Enhanced Filtering and Summarization System	Accuracy (%)87.89	4	4mo ago
In-the-wild Jailbreak Prompts	Enhanced Filtering and Summarization System	Accuracy98.15	4	4mo ago
LLM-LAT/harmful-dataset	Enhanced Filtering and Summarization System	Accuracy92.1	4	4mo ago
Babelscape ALERT	Enhanced Filtering and Summarization System	Accuracy99.73	4	4mo ago
AdvBench (subset)	v2 rubric (deterministic synthesis function)	Fires Rate30	1	1mo ago
Llama-2 Prompt with Random Search 7B-Chat	JoPA	Detection Accuracy91	1	4mo ago
GCG attacks on Llama-2 7B-Chat	JoPA	Detection Accuracy100	1	4mo ago

Showing 23 of 23 rows