Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety Evaluation on AutoDAN
Loading...
100
Safety Score
REFLECTOR (+GDPO)
35.9984
52.6142
69.23
85.8458
May 20, 2026
Safety Score
Updated 13d ago
Evaluation Results
Method
Method
Links
Safety Score
REFLECTOR (+GDPO)
Backbone=Llama-3.1-8B-...
2026.05
100
STAIR
Backbone=Llama-3.1-8B-...
2026.05
99.04
REFLECTOR (+SFT)
Backbone=Llama-3.1-8B-...
2026.05
98.26
REFLECTOR (+GDPO)
Backbone=Qwen-2.5-7B-I...
2026.05
97.89
Shallow-Align
Backbone=Llama-3.1-8B-...
2026.05
96.8
Self-Critique
Backbone=Llama-3.1-8B-...
2026.05
96.15
DPO
Backbone=Llama-3.1-8B-...
2026.05
95.38
STAIR
Backbone=Qwen-2.5-7B-I...
2026.05
95.19
Original
Backbone=Llama-3.1-8B-...
2026.05
94.75
SFT
Backbone=Llama-3.1-8B-...
2026.05
94.62
Shallow-Align
Backbone=Qwen-2.5-7B-I...
2026.05
91.8
REFLECTOR (SFT)
Backbone=Qwen-2.5-7B-I...
2026.05
90.38
Original
Backbone=Qwen-2.5-7B-I...
2026.05
52.89
DPO
Backbone=Qwen-2.5-7B-I...
2026.05
49.04
Self-Critique
Backbone=Qwen-2.5-7B-I...
2026.05
43.7
SFT
Backbone=Qwen-2.5-7B-I...
2026.05
38.46
Feedback
Search any
task
Search any
task