Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Harmful Refusal on DAN
Loading...
94.9
ASR
Qwen2.5-3B-IT
34.892
50.471
66.05
81.629
Oct 9, 2025
Nov 13, 2025
Dec 18, 2025
Jan 22, 2026
Feb 26, 2026
Apr 2, 2026
May 8, 2026
ASR
Updated 22d ago
Evaluation Results
Method
Method
Links
ASR
Qwen2.5-3B-IT
Backbone=Qwen2.5-3B-IT
2026.05
94.9
Qwen2.5-7B-IT
Backbone=Qwen2.5-7B-IT
2026.05
89.7
SELF-REDTEAM
Backbone=Qwen2.5-3B-IT
2026.05
88.5
Qwen2.5-14B-IT
Backbone=Qwen2.5-14B-IT
2026.05
84.5
ABS
Backbone=Qwen2.5-7B-IT
2026.05
75.6
SELF-REDTEAM
Backbone=Qwen2.5-7B-IT
2026.05
72.6
ABS
Backbone=Qwen2.5-3B-IT
2026.05
72.1
ABS
Backbone=Qwen2.5-14B-IT
2026.05
70.6
SELF-REDTEAM
Backbone=Qwen2.5-14B-IT
2026.05
66.1
Defender-Only
Base Model=Llama-3.1-8...
2025.10
54.2
Self-Play
Base Model=Llama-3.1-8...
2025.10
53.7
Llama-3.1-8B-IT
Model Status=Base Mode...
2025.10
53.3
SFT
Base Model=Llama-3.1-8...
2025.10
46.8
Defender-Only + SFT
Base Model=Llama-3.1-8...
2025.10
45.2
Self-Play + SFT
Base Model=Llama-3.1-8...
2025.10
39.6
ELS
Base Model=Llama-3.1-8...
2025.10
37.2
Feedback
Search any
task
Search any
task