Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Harmful Refusal on DAN
Loading...
54.2
ASR
Defender-Only
36.52
41.11
45.7
50.29
Oct 9, 2025
ASR
Updated 1mo ago
Evaluation Results
Method
Method
Links
ASR
Defender-Only
Base Model=Llama-3.1-8...
2025.10
54.2
Self-Play
Base Model=Llama-3.1-8...
2025.10
53.7
Llama-3.1-8B-IT
Model Status=Base Mode...
2025.10
53.3
SFT
Base Model=Llama-3.1-8...
2025.10
46.8
Defender-Only + SFT
Base Model=Llama-3.1-8...
2025.10
45.2
Self-Play + SFT
Base Model=Llama-3.1-8...
2025.10
39.6
ELS
Base Model=Llama-3.1-8...
2025.10
37.2
Feedback
Search any
task
Search any
task