Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Adversarial Toxicity Refusal on LLaMA-2 Chatbot Offensive category
Loading...
47.7
RTR
Adv. attack
-1.7
11.125
23.95
36.775
Jul 8, 2025
RTR
PPL
FBD
Delta R
Updated 16d ago
Evaluation Results
Method
Method
Links
RTR
PPL
FBD
Delta R
Adv. attack
Defense setting=Adv. a...
2025.07
47.7
5.29
9.8
-2.86
No attack
Defense setting=No attack
2025.07
8.8
4.27
10.2
-
Optimus (NH)
Defense setting=Optimu...
2025.07
0.2
6.17
10.9
-
Feedback
Search any
task
Search any
task