Beyond the False Trade-off: Adaptive EWC for Stealthy and Generalizable T2I Backdoors
About
Preserving model fidelity is essential for stealthy text-to-image (T2I) backdoor attacks. Existing methods such as Learning without Forgetting (LwF) rely on output-based distillation, which provides limited regularization. We introduce Elastic Weight Consolidation (EWC) as a parameter-based alternative for preserving fidelity in backdoor learning. While stronger in principle, we show that standard static EWC with a fixed regularization weight lambda and mean-squared utility loss creates an artificial trade-off between attack success rate (ASR) and fidelity, particularly degrading performance on weak triggers. To address this, we propose Cosine-Aware Adaptive EWC, which dynamically adjusts EWC regularization using a cosine-based semantic utility and adaptive scheduling. This approach transforms EWC from a fixed penalty into a context-sensitive constraint, maintaining high ASR while preserving model fidelity. Experiments demonstrate improved ASR-fidelity balance and enhanced robustness on out-of-domain (OOD) datasets compared to existing baselines.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Backdoor attack on text-to-image models | Mixed-style OOD prompts on LoRA experts (5 seeds) | Attack Success Rate (ASR)100 | 5 | |
| LoRA Stylization Preservation | Anime LoRA | ASR1 | 5 | |
| LoRA Stylization Preservation | Photoreal LoRA | ASR1 | 5 | |
| Backdoor Attack Evaluation | SD-Prompts Unicode trigger (in-domain) | ASR (τ=0.85)98.7 | 4 | |
| Backdoor Attack Evaluation | SD-Prompts Phrase trigger (in-domain) | ASR (τ=0.85)99 | 4 | |
| Backdoor Attack Evaluation | SD-Prompts Syntactic trigger (in-domain) | ASR ($ au=0.85$)97.5 | 4 | |
| Out-of-distribution (OOD) Generalization | AG News Syntactic Trigger | ASR @ τ=0.8573.2 | 2 | |
| Out-of-distribution (OOD) Generalization | AG News Unicode Trigger | ASR (τ=0.85)61.2 | 2 | |
| Out-of-distribution (OOD) Generalization | AG News Phrase Trigger | ASR@τ=0.8598.6 | 2 |