Stealthiness Evaluation

Benchmarks

Dataset Name	SOTA Method	Metric
LLaMA 8B 3.1	ArtPrompt	Mean Perplexity1.99	10	2mo ago
LLaMA 2 7B		Mean Perplexity (PPL)2.75	10	2mo ago
LLaMA 7B		Mean PPL2.78	10	2mo ago
WildGuard 7B		Mean Perplexity2.33	10	2mo ago
LLaMA Guard 8B 3.1	ArtPrompt	Mean PPL2.16	10	2mo ago
LLaMA Guard 2 8B	ArtPrompt	PPL Mean2.34	10	2mo ago
LLaMA Guard 7B		Mean Perplexity (PPL)3.05	10	2mo ago
Harmful prompts (evaluated on 3 LLMs and 4 guard LLMs)	ArtPrompt	Mean Perplexity3.23	10	2mo ago
Medium Web Browser	WebTrap	Dual-Goal Success Rate23.81	7	2mo ago
Long Web Browser	WebTrap	Dual-Goal Success Rate47.62	7	2mo ago
MetaQA	AURA	Detected Samples8,321	3	4mo ago
ImageNet	INACTIVE	SSIM0.9867	2	4mo ago

Showing 12 of 12 rows