Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Adversarial Detection on Prefilling
Loading...
99.4
DSR
SALO
-3.976
22.862
49.7
76.538
May 2, 2026
DSR
Updated 28d ago
Evaluation Results
Method
Method
Links
DSR
SALO
Target Model=Qwen2.5-7...
2026.05
99.4
SALO
Target Model=Mistral-7...
2026.05
99.4
SALO
Target Model=Llama-3.1...
2026.05
98.8
Linear Probe
Target Model=Qwen2.5-7...
2026.05
96.5
Linear Probe
Target Model=Llama-3.1...
2026.05
87.3
Linear Probe
Target Model=Mistral-7...
2026.05
68.1
Smooth LLM
Target Model=Llama-3.1...
2026.05
66.5
Smooth LLM
Target Model=Qwen2.5-7...
2026.05
57.9
PPL Filter
Target Model=Llama-3.1...
2026.05
43.7
Smooth LLM
Target Model=Mistral-7...
2026.05
42.9
No Defense (1-ASR)
Target Model=Llama-3.1...
2026.05
39.8
PPL Filter
Target Model=Qwen2.5-7...
2026.05
39.2
No Defense (1-ASR)
Target Model=Qwen2.5-7...
2026.05
24.8
No Defense (1-ASR)
Target Model=Mistral-7...
2026.05
15
GradSafe
Target Model=Llama-3.1...
2026.05
12
GradSafe
Target Model=Mistral-7...
2026.05
1.7
PPL Filter
Target Model=Mistral-7...
2026.05
0
GradSafe
Target Model=Qwen2.5-7...
2026.05
0
Feedback
Search any
task
Search any
task