Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety Evaluation on DeepInception (test)
Loading...
78.5
ASR
No Defense
-2.308
18.671
39.65
60.629
Jan 23, 2026
ASR
Updated 1mo ago
Evaluation Results
Method
Method
Links
ASR
No Defense
Backbone=Qwen2.5-14B-I...
2026.01
78.5
PPL
Backbone=Qwen2.5-14B-I...
2026.01
73
ICD
Backbone=Qwen2.5-14B-I...
2026.01
50.8
Self-Reminder
Backbone=Qwen2.5-14B-I...
2026.01
43.2
Self-Examination
Backbone=Qwen2.5-14B-I...
2026.01
5.6
SafeDecoding
Backbone=Qwen2.5-14B-I...
2026.01
0.8
SafeThinker
Backbone=Qwen2.5-14B-I...
2026.01
0.8
Feedback
Search any
task
Search any
task