Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Llama-Guard

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety EvaluationLlama-Guard 3-8B
ASR2.38
56
Safety alignment evaluationLlama-Guard
Harmfulness (%)82.14
36
Stealthiness EvaluationLLaMA Guard 8B 3.1
Mean PPL2.16
10
Stealthiness EvaluationLLaMA Guard 2 8B
PPL Mean2.34
10
Stealthiness EvaluationLLaMA Guard 7B
Mean Perplexity (PPL)3.05
10
JailbreakingLLaMA Guard 2 8B
Bypass Rate100
1
JailbreakingLLaMA Guard 7B
Bypass Rate98.65
1
Latent Causal Mechanism InferenceLlama Guard LLM latent mechanism
Metric-
0
Showing 8 of 8 rows