Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Adversarial Toxicity Refusal on LLaMA-2 Chatbot Specialized category
Loading...
59.8
Refusal Rate (RTR)
Adv. attack
-0.52
15.14
30.8
46.46
Jul 8, 2025
Refusal Rate (RTR)
Perplexity (PPL)
FBD
Delta R
Updated 16d ago
Evaluation Results
Method
Method
Links
Refusal Rate (RTR)
Perplexity (PPL)
FBD
Delta R
Adv. attack
Defense setting=Adv. a...
2025.07
59.8
5.85
9.7
-
No attack
Defense setting=No attack
2025.07
13.1
5.46
9.7
-
Optimus (NH)
Defense setting=Optimu...
2025.07
1.8
6.23
10.3
-1.55
Feedback
Search any
task
Search any
task