Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Concept Unlearning Robustness on Ring-A-Bell adversarial prompts (K77)

1.05ASR (Threshold 0.3)

ACE

-1.897617.998737.89557.7913Mar 19, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
1.05000
2026.03
1.051.0500
2026.03
22.1115.797.371.05
2026.03
36.8422.2115.796.32
2026.03
41.0531.5817.898.42
2026.03
49.4741.0518.952.11
2026.03
49.4738.9522.119.47
2026.03
53.6841.0517.894.21
2026.03
65.2654.4733.6814.74
2026.03
74.7466.3245.2625.26