Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Adversarial Detection on AutoDAN
Loading...
100
DSR
SALO
-4
23
50
77
May 2, 2026
DSR
Updated 28d ago
Evaluation Results
Method
Method
Links
DSR
SALO
Target Model=Llama-3.1...
2026.05
100
Linear Probe
Target Model=Qwen2.5-7...
2026.05
99.2
Linear Probe
Target Model=Llama-3.1...
2026.05
98.8
SALO
Target Model=Qwen2.5-7...
2026.05
98.8
SALO
Target Model=Mistral-7...
2026.05
98.8
Smooth LLM
Target Model=Qwen2.5-7...
2026.05
82
GradSafe
Target Model=Llama-3.1...
2026.05
76
Smooth LLM
Target Model=Llama-3.1...
2026.05
72
PPL Filter
Target Model=Llama-3.1...
2026.05
67.2
No Defense (1-ASR)
Target Model=Qwen2.5-7...
2026.05
55.6
No Defense (1-ASR)
Target Model=Llama-3.1...
2026.05
54.4
PPL Filter
Target Model=Qwen2.5-7...
2026.05
32.8
Linear Probe
Target Model=Mistral-7...
2026.05
32
GradSafe
Target Model=Qwen2.5-7...
2026.05
10
No Defense (1-ASR)
Target Model=Mistral-7...
2026.05
3.2
GradSafe
Target Model=Mistral-7...
2026.05
2
Smooth LLM
Target Model=Mistral-7...
2026.05
1.6
PPL Filter
Target Model=Mistral-7...
2026.05
0
Feedback
Search any
task
Search any
task